Accurate and Efficient Mosquito Genus Classification Algorithm Using Candidate-Elimination and Nearest Centroid on Extracted Features of Wingbeat Acoustic Properties

Document Type


Publication Date



The automatic identification of mosquito genus, if used together with effective strategies of suppression and control may help reduce the spread of mosquito-borne diseases. In this study, we explored and developed a simple and yet very effective algorithm for processing audio files to determine the presence (or absence) of a mosquito and then identify the correct genus for those involving a mosquito. A dataset of sound recordings from the Humbug Project of Zooniverse, collected by researchers from Oxford University, and actual recordings of mosquitoes in the Philippines were used in this study. Our developed technique involves extracting filter bank values from corresponding spectrograms of the audio files, and we built a classification model based only on three simple statistics from said collected values -- maximum, first quartile and third quartile. Specifically, the maximum values were used in defining thresholds for the candidate-elimination phase of the algorithm, and then the first and third quartile values were used in the succeeding nearest centroid computation phase. The proposed algorithm yielded an impressive 97.2% average classification accuracy from a 5-fold stratified cross validation. This is competitive with the 75.55–97.65% accuracy results reported in literature for different mosquito classification tasks run on different datasets. Moreover, the achieved accuracy is significantly higher than the 86.6% that we gathered from applying a CNN architecture from literature to our same dataset. Aside from being more accurate, the proposed algorithm is also significantly more efficient than the CNN model, requiring much less time (in both training and predicting phases) and memory space. The results offer a promising technique that may also simplify the process of solving other sound-based classification problems.