FASSSTER Data Pipeline and DevOps

Document Type

Book Chapter

Publication Date

8-8-2023

Abstract

In data science; the data pipeline serves as a methodological and potentially architectural framework for setting up systems that require near real-time monitoring through dashboards and visualization. The collection; aggregation; and analysis of data related to COVID-19 cases proved to be important in providing the community with the right information at the right time. In the beginning of the pandemic; the data used for interpretation came from different data sources. Some datasets were made available to the public by the Department of Health (DOH) by publishing a Google Drive that contained the datasets in spreadsheet format (http://bit.ly/DataDropPH). Eventually; DOH provided access to a BigQuery database to select groups where data can be automatically extracted on a daily basis. These datasets are extracted and ingested to a data warehouse for further analysis. Various data analysis and modeling techniques are applied to the data. As such; data analysis scripts are written using two popular programming languages; R and Python; to facilitate the processing and transformation of data. The stakeholders then view model outputs in a web-based visualization platform. This chapter describes the FASSSTER data pipeline; from extraction; preprocessing; and processing to produce outputs generated by analytics and models and corresponding data visualization techniques.

Share

COinS