Development of Taglish Sentiment and Emotion Text Analysis Models From Covid-19 Vaccination Tweets in the Philippines

Date of Award

7-1-2022

Document Type

Thesis

Degree Name

Master of Science in Computer Science, Straight

First Advisor

Maria Regina Justina E. Estuar, PhD

Abstract

Social media can be used to understand how the public is responding to the nationwide COVID-19 vaccination campaign, allowing policymakers to respond effectively with informed decisions. However, conducting social media analysis in the Philippine context presents a challenge because nat- ural informal conversations make use of a combination of English and the local language. This study addressed this by including part-of-speech tags, degree of adjectives, frequency of code-switching, language dominance, and affixes as features to represent bilingualism in training machine learning models with COVID-19 vaccination-related Tweets for sentiment and emo- tion analysis. Results show that the bilingual features improved the En- semble sentiment model trained with bag of words by 6.04% in accuracy and 5.53% in F1 score but did not improve the performance of the Bayesian Ridge emotion model trained with bag of words and PCA. This shows that sentiment is linked to language usage, represented as bilingual features, while emotion classes are more granular and better analyzed with specific word choices. Nevertheless, both sentiment and emotion models devel- oped performed better than existing Python analyzers with an accuracy of 69.89% and an F1 score of 69.86% for sentiment and MSE of 0.0402 for emotion. A general negative sentiment focused on anger and sadness was also revealed by running the models on Tweets from September 2021 to April 2022.

Share

COinS