Automation of Data Cleansing Methods for Covid19 Contact Tracing Data in the Philippines
Date of Award
12-2021
Document Type
Thesis
Degree Name
Master of Science in Computer Science
First Advisor
Ma. Regina Justina E. Estuar, PhD
Abstract
Data has become important in helping government and health- care organizations create effective responses to mitigate the spread of the COVID19 virus. Using data as basis for decision making leads to better and more grounded policies and response implementations. How- ever, data quality is often overlooked during data collection even with data handling guidelines in place because of the immense scale of data collected and unprepared eHealth systems. Data cleaning is the most crucial and important, and the most time consuming part in data min- ing. Dirty data and time spent in data cleaning impacts the performance of models and causes delays in producing results needed in decision mak- ing. This study developed data cleaning scripts to clean and improve the data quality of the COVID19 data without consuming too much time. The data cleaning process framework is designed and used in developing the data cleaning scripts, analyzing and identifying data quality issues, and defining the transformation workflows. Data quality issues validity, consistency, completeness, and uniqueness are found in the COVID19 data during data analysis, where challenges and causes of these issues are identified to define the data transformation workflows. Validation scripts are also developed in this study to validate data before and after data cleaning to measure the improvement in data quality. The overall data quality of the COVID19 data is 90.91%, where data is 95.94% valid, 99.59% consistent, and 71.13% complete with 3.02% duplicates.
Recommended Citation
Jillian Yasmin, Chua C., (2021). Automation of Data Cleansing Methods for Covid19 Contact Tracing Data in the Philippines. Archīum.ATENEO.
https://archium.ateneo.edu/theses-dissertations/748
