Development of Bilingual Sentiment and Emotion Text Classification Models from COVID-19 Vaccination Tweets in the Philippines

Document Type

Conference Proceeding

Publication Date



Social media can be used to understand how the public is responding to the ongoing nationwide COVID-19 vaccination campaign, allowing policymakers to respond effectively through informed decisions. However, conducting social media analysis in the Philippine-context presents a challenge because natural informal conversations make use of a combination of English and local language. This study addresses this challenge by including part-of-speech tags, frequency of code switching and language dominance features to represent bilingualism in training machine learning models with COVID-19 vaccination-related Tweets for sentiment and emotion analysis. Results showed that the English-Tagalog Logistic Regression sentiment classification model performed better than Textblob, VADER and Polyglot with an accuracy of 70.36%. Similarly, the English-Tagalog SVM emotion classification model performed better than Text2emotion, NRC Affect Intensity Lexicon and EmoTFIDF with an average mean-squared error of 0.049. The added bilingual features only improved these performance metrics by a small margin. Nevertheless, SHAP analysis still revealed that sentiment and emotion classes exhibit varying levels of these bilingual features, which shows the potential in exploring similar linguistic features to distinguish between classes better during text classification for future studies. Finally, Tweets from September 2021 to January 2022 shows negative, mainly anger and sadness, perceptions towards COVID-19 vaccinations.