bAI-bAI: A Context-Aware Transliteration System for Baybayin Scripts
Document Type
Conference Proceeding
Publication Date
1-1-2025
Abstract
Baybayin, a pre-colonial writing system from the Philippines, has seen a resurgence in recent years. Existing studies on Baybayin OCR face challenges with ambiguous Baybayin words that have multiple possible transliterations. This study introduces a disambiguation technique that employs word embeddings (WE) for contextual analysis and uses part-of-speech (POS) tagging as an initial filtering step. This approach is compared with an LLM method that prompts GPT-4o mini to determine the most appropriate transliteration given a sentence input. The proposed disambiguation process is integrated into existing Baybayin OCR systems to develop bAI-bAI, a context-aware Baybayin transliteration system capable of handling ambiguous words. Results show that incorporating POS as a filter does not significantly affect performance. The WE-Only method yields an accuracy of 77.46% and takes 5.35ms to process one sample while leveraging GPT-4o mini peaks at a higher accuracy of 90.52% but with a much longer runtime of 3280ms per sample. These findings present an opportunity to further explore and improve NLP approaches in disambiguation methods.
Recommended Citation
Jacob Simon D. Bernardo and Maria Regina Justina E. Estuar. 2025. bAI-bAI: A Context-Aware Transliteration System for Baybayin Scripts. In Proceedings of the Second Workshop in South East Asian Language Processing, pages 1–9, Online. Association for Computational Linguistics.