Document Type

Conference Proceeding

Publication Date

1-1-2024

Abstract

Businesses deal with different types of documents containing unstructured documents. The data in these documents must be converted into digital forms other automated systems could only process. One generic use case is document classification, which usually involves manual transformation due to human understanding needed in the process. These documents go beyond those generated through regular business transactions and operations and also include web-based content such as online news, blogs, e-mails, and various digital libraries. Recent developments in robotic process automation (RPA) and artificial intelligence (AI) aim to automate the otherwise expensive, time-consuming, and repetitive manual steps. Through more powerful natural language processing (NLP) and natural language understanding (NLU) capabilities, large language models (LLMs) may come as a big boost in applying AI to RPA initiatives. This study proposes a general approach to using LLMs as document classifier co-pilots for knowledge workers in charge of classifying documents to be useful. The manner of prompt engineering and refinement involving labeled health insurance documents to achieve better results is discussed and evaluated through early, iterative classification attempts. However, early tests with a complex sample use case show unsatisfactory results. The study ends with recommendations for future work to improve precision and recall performance.

Share

COinS