Insights on the Hierarchy of Letters in Scrabble Using Cosine Similarity, Minimum Spanning Tree, and Centrality Analysis
Document Type
Conference Proceeding
Publication Date
3-7-2024
Abstract
This study aims to generate insights on the hierarchy and importance of letters in the game Scrabble by employing two operational research frameworks. Both frameworks begin by using a vector space model whose basis vectors are all the valid Scrabble words and where each letter is treated as a vector. A network of the letters is then constructed where the edge weight between each pair of letters is determined using the corresponding vectors' cosine similarity, which is effectively a measure of the co-occurrence rate of the two letters. The first framework continues by obtaining the minimum spanning tree of the network and performing centrality analysis on the MST. Through the first framework, a hierarchy of the letters is obtained. This hierarchical arrangement shows how letters lower in the hierarchy depend on higher-level letters. On the other hand, the second framework involves performing centrality analysis on the original network of letters and results in a ranking of letters based on their co-occurrence rate with other letters. Based on the frameworks in the study, letter E emerges as the highest ranked letter while the letter Q consistently ranks at the bottom. Thus, the study demonstrates how the two frameworks can be used for a novel application and other possible applications of a similar nature.
Recommended Citation
Mark Anthony C. Tolentino, Vince Andrew L. Lee, Axirazel D. Lorenzo, Tristan Emmanuel A. Ramos; Insights on the hierarchy of letters in scrabble using cosine similarity, minimum spanning tree, and centrality analysis. AIP Conf. Proc. 7 March 2024; 2895 (1): 070005. https://doi.org/10.1063/5.0192067