Theses and Dissertations (All)

Development of a Bilingual Hate Speech Detection System Using Filipino Reddit Texts

Raphael Christen K. Enriquez, Ateneo de Manila

Date of Award

7-1-2023

Document Type

Thesis

Degree Name

Master of Science in Computer Science

First Advisor

Maria Regina Justina E. Estuar, PhD

Abstract

Hate speech, deliberate attacks on groups based on their identity, often proliferate through social media. However, existing automatic hate speech detection tools primarily focus on high-resource languages like English, posing challenges for detecting hate speech in low-resource languages like Filipino. This study addresses this limitation by developing a bilingual hate speech detection system and dataset using Filipino texts from Reddit. The system leverages bilingual and psycho-linguistic features, including part-of-speech tags, code-switching frequency, and language dominance. Machine learning and deep learning techniques are applied to develop the hate speech detection system. The results indicate that both model types exhibited competitive performance in hate speech detection, demonstrating potential in hate speech detection. The integration of psycho-linguistic features improved the performance of machine learning models, highlighting the value of incorporating linguistic information. The results highlight the development of a bilingual hate speech detection system, the creation of a usable and shareable annotated dataset, the utilization of various feature extraction techniques, the effectiveness of transformer-based models, and the system’s accuracy in detecting hate speech in the Filipino language. The resulting system and annotated dataset are deployed and made publicly available for future research, contributing to addressing the scarcity of hate speech detection resources for low-resource languages.

Recommended Citation

Enriquez, Raphael Christen K., (2023). Development of a Bilingual Hate Speech Detection System Using Filipino Reddit Texts. Archīum.ATENEO.
https://archium.ateneo.edu/theses-dissertations/864

Link to Full Text

COinS

Theses and Dissertations (All)

Development of a Bilingual Hate Speech Detection System Using Filipino Reddit Texts

Date of Award

Document Type

Degree Name

First Advisor

Abstract

Recommended Citation

Browse

Author Corner

About Archium

Theses and Dissertations (All)

Development of a Bilingual Hate Speech Detection System Using Filipino Reddit Texts

Author

Date of Award

Document Type

Degree Name

First Advisor

Abstract

Recommended Citation

Share

Browse

Author Corner

About Archium