Development of a Text Classification Model to Detect Disinformation About Covid-19 in Social Media: Understanding the Features and Narratives of Disinformation in the Philippines

Date of Award

12-1-2022

Document Type

Thesis

Degree Name

Master of Science in Computer Science, Straight

First Advisor

Maria Regina Justina E. Estuar, PhD

Abstract

With the low uptake of booster shots still ongoing, the Philippine government continues to promote compliance with minimum health standards to prevent the spread of the virus. Disinformation about the virus remains a challenge if compliance to minimum health standards are necessary to curb the spread of COVID-19. The study aims to understand the necessary features of disinformation of COVID-19 in a Philippine context. The creation and comparison of text classification models to detect disinformation of COVID-19 was also done. The usage of social network analysis was performed to understand the narratives as well. Words related to vaccines, and government corruption/mismanagement were prevalent under the disinformation categories of “False” and “Mostly False” while words related to health information such as cases or vaccine counts were prevalent under the “Mostly True” and “True” category. RoBERTa-Tagalog-Base performed the best overall in detecting disinformation after comparing both model approaches. The model was validated by using a different disinformation dataset where it garnered a 73% accuracy in detecting “False” information. Disinformation narratives revolved around the idea of COVID- 19 cases/vaccines, government mismanagement, and regulations. Disinformation was caused by distrust of the government’s management over the pandemic. Moreover, the spread of disinformation was contained to the user itself and spread to at least one other user.

Share

COinS