Determining Linguistic Markers in Cognitive Distortions from COVID-19 Pandemic-Related Reddit Texts

Document Type

Conference Proceeding

Publication Date



Distorted thoughts may signify underlying mental illness, and when detected early, may serve as preventive measure to a more serious condition. A significant shift to more pronounced negative sentiments has been observed in the Social Media Platform, Reddit, during the onset of the COVID-19 Pandemic. Individuals who engage in these platforms post and comment to express thoughts and feelings. This study aims to determine features that can help detect the presence of distorted thoughts, known as cognitive distortions, in a COVID-19 pandemic-related texts. Texts were extracted from a COVID-19 Support Group in Reddit and verified through annotation for presence or absence of cognitive distortions. Linguistic features were extracted using R and LIWC to determine the best set of features that can distinguish distorted from non-distorted texts. Results showed that cognitive distortions have distinguishable features in COVID-19 Pandemic-related texts. Specifically, results of Independent Samples T-test showed that distorted texts had significantly higher scores on: word count, sentiment score, authenticity, and usage of the following words: function words, pronouns in general, first-person singular pronoun, impersonal pronouns, verbs, interrogatives, positive emotions, cognitive processes on insights, discrepancy, and certainty, present-tense verbs, future-tense verbs and swear words. Further tests using Naive Bayes and Linear SVM machine learning model showed that some of these significant features can indeed help detect whether a sentence is distorted or not. Results from this study can be used to develop detection models on cognitive distortions.