Document Type


Publication Date



Finding determinants of disease outbreaks before its occurrence is necessary in reducing its impact in populations. The supposed advantage of obtaining information brought by automated systems fall short because of the inability to access real-time data as well as interoperate fragmented systems, leading to longer transfer and processing of data. As such, this study presents the use of realtime latent data from social media, particularly from Twitter, to complement existing disease surveillance efforts. By being able to classify infodemiological (health-related) tweets, this study is able to produce a range of possible disease incidences of Dengue and Typhoid Fever within the Western Visayas region in the Philippines. Both diseases showed a strong positive correlation (R > .70) between the number of tweets and surveillance data based on official records of the Philippine Health Agency. Regression equations were derived to determine a numerical range of possible disease incidences given certain number of tweets. As an example, the study shows that 10 infodemiological tweets represent the presence of 19-25 Dengue Fever incidences at the provincial level.