"bAI-bAI: A Context-Aware Transliteration System for Baybayin Scripts" by Jacob Simon Bernardo and Ma. Regina Justina Estuar
 

bAI-bAI: A Context-Aware Transliteration System for Baybayin Scripts

Document Type

Conference Proceeding

Publication Date

1-1-2025

Abstract

Baybayin, a pre-colonial writing system from the Philippines, has seen a resurgence in recent years. Existing studies on Baybayin OCR face challenges with ambiguous Baybayin words that have multiple possible transliterations. This study introduces a disambiguation technique that employs word embeddings (WE) for contextual analysis and uses part-of-speech (POS) tagging as an initial filtering step. This approach is compared with an LLM method that prompts GPT-4o mini to determine the most appropriate transliteration given a sentence input. The proposed disambiguation process is integrated into existing Baybayin OCR systems to develop bAI-bAI, a context-aware Baybayin transliteration system capable of handling ambiguous words. Results show that incorporating POS as a filter does not significantly affect performance. The WE-Only method yields an accuracy of 77.46% and takes 5.35ms to process one sample while leveraging GPT-4o mini peaks at a higher accuracy of 90.52% but with a much longer runtime of 3280ms per sample. These findings present an opportunity to further explore and improve NLP approaches in disambiguation methods.

Share

COinS