Insights on the Hierarchy of Letters in Scrabble Using Cosine Similarity, Minimum Spanning Tree, and Centrality Analysis

Document Type

Conference Proceeding

Publication Date



This study aims to generate insights on the hierarchy and importance of letters in the game Scrabble by employing two operational research frameworks. Both frameworks begin by using a vector space model whose basis vectors are all the valid Scrabble words and where each letter is treated as a vector. A network of the letters is then constructed where the edge weight between each pair of letters is determined using the corresponding vectors' cosine similarity, which is effectively a measure of the co-occurrence rate of the two letters. The first framework continues by obtaining the minimum spanning tree of the network and performing centrality analysis on the MST. Through the first framework, a hierarchy of the letters is obtained. This hierarchical arrangement shows how letters lower in the hierarchy depend on higher-level letters. On the other hand, the second framework involves performing centrality analysis on the original network of letters and results in a ranking of letters based on their co-occurrence rate with other letters. Based on the frameworks in the study, letter E emerges as the highest ranked letter while the letter Q consistently ranks at the bottom. Thus, the study demonstrates how the two frameworks can be used for a novel application and other possible applications of a similar nature.