Theses and Dissertations (All)

Towards a bilingual sentiment analysis model for English and Filipino

DE LEON MARLENE

Date of Award

2013

Document Type

Dissertation

Degree Name

Doctor of Philosophy in Computer Science

Department

Information Systems & Computer Science

First Advisor

Ma. Regina Justina E. Estuar, Ph.D.

Abstract

There is an opportunity to learn and understand how Filipinos think, behave and react online, especially in responding to significant events. Resources, such as lexicons and corpora or a combination in a target language, as well as selection machine learning classifiers may be used to address this opportunity. However, there is little work on bilingual conversations. Filipino Tweets provide a rich source of data for building corpora and model for this kind of classification as it is composed of a mixture of English and mostly Filipino terms. This study looked into building bilingual sentiment analysis models for classifying bilingual English and Filipino disaster tweets. The study applied a supervised learning approach for subjective and sentiment models using Support Vector Machine (SVM), Na?ve Bayes, and K-Nearest Neighbor (K-NN) and bilingual English and Filipino lexicon, corpora and a combination in fixed distribution sets, in creating bilingual English and Filipino sentiment analysis models. Accuracy, precision, recall and F-measure were used to evaluate the performance of the models. Each of the resulting models were further evaluated against manually annotated corpora of tweets to determine its performance and reliability. For the bilingual subjective classification model, performance was highest in Nave Bayes, using the combination of lexicon and corpora, at 95% objective-5% subjective imbalanced distribution, with F measure of 73.53%. Similarly, the bilingual sentiment classification model performed highest in Na?ve Bayes, using the combination of lexicon and corpora, at 95% positive-5% negative, with F measure of 72.41%. The study showed that for English-Filipino sentiments, bilingual classification works best with an imbalanced distribution scheme and combination of lexicon and corpora data sets. PCA was performed further on the resulting positive and negative sentiments to obtain manifest constructs on sentiments. Results showed a promising possibility of extending the bilingual sentiment classification model further to include specific positive and negative emotions.

Comments

The C7.D456 2013

Recommended Citation

MARLENE, DE LEON, (2013). Towards a bilingual sentiment analysis model for English and Filipino. Archīum.ATENEO.
https://archium.ateneo.edu/theses-dissertations/226

Link to Full Text

COinS

Theses and Dissertations (All)

Towards a bilingual sentiment analysis model for English and Filipino

Date of Award

Document Type

Degree Name

Department

First Advisor

Abstract

Comments

Recommended Citation

Browse

Author Corner

About Archium

Theses and Dissertations (All)

Towards a bilingual sentiment analysis model for English and Filipino

Author

Date of Award

Document Type

Degree Name

Department

First Advisor

Abstract

Comments

Recommended Citation

Share

Browse

Author Corner

About Archium