Towards a bilingual sentiment analysis model for English and Filipino

Date of Award

2013

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Information Systems & Computer Science

First Advisor

Ma. Regina Justina E. Estuar, Ph.D.

Abstract

  • There is an opportunity to learn and understand how Filipinos think, behave and react online, especially in responding to significant events. Resources, such as lexicons and corpora or a combination in a target language, as well as selection machine learning classifiers may be used to address this opportunity. However, there is little work on bilingual conversations. Filipino Tweets provide a rich source of data for building corpora and model for this kind of classification as it is composed of a mixture of English and mostly Filipino terms. This study looked into building bilingual sentiment analysis models for classifying bilingual English and Filipino disaster tweets. The study applied a supervised learning approach for subjective and sentiment models using Support Vector Machine (SVM), Na?ve Bayes, and K-Nearest Neighbor (K-NN) and bilingual English and Filipino lexicon, corpora and a combination in fixed distribution sets, in creating bilingual English and Filipino sentiment analysis models. Accuracy, precision, recall and F-measure were used to evaluate the performance of the models. Each of the resulting models were further evaluated against manually annotated corpora of tweets to determine its performance and reliability. For the bilingual subjective classification model, performance was highest in Nave Bayes, using the combination of lexicon and corpora, at 95% objective-5% subjective imbalanced distribution, with F measure of 73.53%. Similarly, the bilingual sentiment classification model performed highest in Na?ve Bayes, using the combination of lexicon and corpora, at 95% positive-5% negative, with F measure of 72.41%. The study showed that for English-Filipino sentiments, bilingual classification works best with an imbalanced distribution scheme and combination of lexicon and corpora data sets. PCA was performed further on the resulting positive and negative sentiments to obtain manifest constructs on sentiments. Results showed a promising possibility of extending the bilingual sentiment classification model further to include specific positive and negative emotions.

Comments

The C7.D456 2013

Share

COinS