Improving Incremental Packrat Parsing

Date of Award


Document Type


Degree Name

Master of Science in Chemistry


Information Systems & Computer Science

First Advisor

Proceso L. Fernandez, Jr., PhD


Packrat parsing, introduced by Ford in 2002 [12, 13] is a na ̈ıve, backtracking, recursive-descent parsing technique that guarantees linear parse times by using memo- ization. Most of the time, a packrat parser is specified using parsing expression gram- mars (PEG) [14], a new formalism to describe programming language syntax, similar to the extended Backus-Naur form [6]. In 2017, incremental packrat parsing was in- troduced by Dubroy and Warth [9]. It takes advantage of the memoization table of the previous parse, and reuses it in the current parse, thereby reducing parse times. But such parser still suffers from some slowdowns brought about by sufficiently long inputs. Ad- ditionally, parse times are not reflective of the size of the change in input. Moreover, incremental packrat parsing incurs an additional 12% heap memory usage as compared to its non-incremental variant. This study aimed to improve incremental packrat parsing by further reducing its parse time and heap memory usage. This was done through the use of techniques from experimental algorithmics namely, algorithm tuning, and code tuning, as well as three existing optimizations to packrat parsing namely, grammar simplification, inlining, and transient optimization. Parsing performance of the improved incremental packrat parser was then compared to the performance of the existing one. The improved parser was significantly faster (t(890) = 49.62, p = 8.33×10−259), with about 68% lower average parse time. It also used significantly lower (t(25) = 4.79, p = 3.60×10−5 ) heap memory, with about 26% reduction in size on the average. These results prove that after incorporating the aforementioned techniques and optimizations, the improved incremental packrat parser fared better than the existing parser, in terms of parsing performance.

This document is currently not available here.