Sentiment Analysis of Code-Switched Filipino-English Product and Service Reviews Using Transformers-Based Large Language Models

Document Type

Conference Proceeding

Publication Date



Bilingual individuals already outnumber monolinguals yet most of the available resources for research in natural language processing (NLP) are for high-resource single languages. A recent area of interest in NLP research for low-resource languages is code-switching, a phenomenon in both written and spoken communication marked by the usage of at least two languages in one utterance. This work presented two novel contributions to NLP research for low-resource languages. First, it introduced the first sentiment-annotated corpus of Filipino-English Reviews with Code-Switching (FiReCS) with more than 10k instances of product and service reviews. Second, it developed sentiment analysis models for Filipino-English text using pre-trained Transformers-based large language models (LLMs) and introduced benchmark results for zero-shot sentiment analysis on text with code-switching using OpenAI’s GPT-3 series models. The performance of the Transformers-based sentiment analysis models were compared against those of existing lexicon-based sentiment analysis tools designed for monolingual text. The fine-tuned XLM-RoBERTa model achieved the highest accuracy and weighted average F1-score of 0.84 with F1-scores of 0.89, 0.86, and 0.78 in the Positive, Negative, and Neutral sentiment classes, respectively. The poor performance of the lexicon-based sentiment analysis tools exemplifies the limitations of such systems that are designed for a single language when applied to bilingual text involving code-switching.