FASSSTER Data Pipeline and DevOps
Document Type
Book Chapter
Publication Date
8-8-2023
Abstract
In data science; the data pipeline serves as a methodological and potentially architectural framework for setting up systems that require near real-time monitoring through dashboards and visualization. The collection; aggregation; and analysis of data related to COVID-19 cases proved to be important in providing the community with the right information at the right time. In the beginning of the pandemic; the data used for interpretation came from different data sources. Some datasets were made available to the public by the Department of Health (DOH) by publishing a Google Drive that contained the datasets in spreadsheet format (http://bit.ly/DataDropPH). Eventually; DOH provided access to a BigQuery database to select groups where data can be automatically extracted on a daily basis. These datasets are extracted and ingested to a data warehouse for further analysis. Various data analysis and modeling techniques are applied to the data. As such; data analysis scripts are written using two popular programming languages; R and Python; to facilitate the processing and transformation of data. The stakeholders then view model outputs in a web-based visualization platform. This chapter describes the FASSSTER data pipeline; from extraction; preprocessing; and processing to produce outputs generated by analytics and models and corresponding data visualization techniques.
Recommended Citation
Tamayo, L. P., Pulmano, C., Santos, R. J., Buhain, J-A., Ico, R. (2023). FASSSTER Data Pipeline and DevOps. In: Estuar, M. R. J., De Lara-Tuprio, E. (eds) COVID-19 Experience in the Philippines. Disaster Risk Reduction. Springer, Singapore. https://doi.org/10.1007/978-981-99-3153-8_3.