Applying XGBoost, neural networks, and oversampling in the undernutrition classification of school-aged children in the Philippines

Mark Kevin A.Ong Yiu, Ateneo de Manila University
Carlo Gabriel M. Pastor, Ateneo de Manila University
Gabrielle Jackie C. Candano, Ateneo de Manila University
Eden Delight P. Miro, Ateneo de Manila University
Victor Andrew A. Antonio, Ateneo de Manila University
Clark Kendrick C. Go, Ateneo de Manila University

Abstract

In the Philippines, one in five school-aged children are affected by undernutrition, increasing their risk of physical and cognitive development. The Department of Education (DepEd) attempts to address this issue by targeting children with low body mass index (BMI) for their school-based feeding program (SBFP). However, challenges like inadequate measuring tools and supervision in low-resource communities have led to large discrepancies in the nutritional status of SBFP beneficiaries and non-beneficiaries. Siy Van et al. [1] addresses the difficulties associated with BMI by using machine learning (ML) to predict undernutrition among school-aged children based on socioeconomic and demographic characteristics, dietary diversity scores, and food insecurity scores. Their study compared several ML algorithms and found that their best performing model in terms of accuracy was a random forest (RF) model. However, the RF model had high sensitivity with low specificity, indicating a bias towards the positive class. This study aims to improve these results by employing oversampling techniques and other ML algorithms that were not used in the study. Using the same data set in [1], this study compares four machine learning algorithms (RF, XGBoost, DNN, and NNRF) to predict undernutrition among school-aged children, managing imbalanced data using three oversampling techniques (SMOTE, Borderline-SMOTE, and ADASYN). Eight independent classification tasks for predicting undernutrition were performed, and results showed that a RF-Borderline model performed the best in terms of Cohen's ? (0.3662), with an accuracy of 71.61%, sensitivity of 71.13%, and a specificity of 73.08%. While RF performed the best overall, XGBoost and NNRF performed better than RF on specific tasks. Notably, incorporating oversampling consistently enhanced model performance.