Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models

Document Type

Conference Proceeding

Publication Date

1-1-2024

Abstract

Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and spoken communication marked by the usage of at least two languages in one utterance. This work presented two novel contributions to NLP research for low-resource languages. First, it introduced the first sentiment-annotated corpus of Filipino-English Reviews with Code-Switching (FiReCS) with more than 10k instances of product and service reviews. Second, it developed sentiment analysis models for Filipino-English text using pre-trained Transformers-based large language models (LLMs) and introduced benchmark results for zero-shot sentiment analysis on text with code-switching using OpenAI’s GPT-3 series models. The performance of the Transformers-based sentiment analysis models were compared against those of existing lexicon-based sentiment analysis tools designed for monolingual text. The fine-tuned XLM-RoBERTa model achieved the highest accuracy and weighted average F1-score of 0.84 with F1-scores of 0.89, 0.86, and 0.78 in the Positive, Negative, and Neutral sentiment classes, respectively. The poor performance of the lexicon-based sentiment analysis tools exemplifies the limitations of such systems that are designed for a single language when applied to bilingual text involving code-switching.

Share

COinS