Developing a Framework With Small File Adaptations for the Analysis of Music Data on Hadoop
Date of Award
5-2022
Document Type
Thesis
Degree Name
Master of Science in Computer Science
First Advisor
Andrei D. Coronel, PhD
Abstract
With the availability of large-scale music datasets comes the need for systems that are capable of processing these large amounts of data. Horizontally scalable systems, such as the Hadoop ecosystem, can address that need. One area that can benefit from this is the Music Information Retrieval (MIR) space. One such application of MIR is the analysis of music similarity, often used in recommendation systems. A wide array of music recommendation systems exist that use collaborative filtering to help predict users' musical preferences based on their current interests and that of others. This study, however, explores a content-based solution that can run on Hadoop. Instead of relying on user information, this study focuses on building a framework that directly analyzes the music file's contents and retrieves information from the actual music data, instead of the user's preferences. This study designs a framework that takes MIDI files as input. The chosen format, however, raises some problems. This is because the Hadoop environment is known to have performance challenges when it comes to handling large amounts of small files, which is a characteristic of the MIDI format. This study addresses that issue by adapting the framework for small files. Results show a significant improvement in query response time with a speed-up of 210.79x, 236.93x, 239.81x, and 266.70x for the scan and filter, aggregate, join, and aggregate-join queries respectively.
Recommended Citation
Alberto, Medalla H., (2022). Developing a Framework With Small File Adaptations for the Analysis of Music Data on Hadoop. Archīum.ATENEO.
https://archium.ateneo.edu/theses-dissertations/751
