Measuring the Performance of An Object-Based Multi-Cloud Data Lake
Document Type
Conference Proceeding
Publication Date
1-1-2023
Abstract
As the amount of data generated by society continues to become less structured and larger in size, more and more organizations are implementing data lakes in the public cloud to store, process, and analyze this data. However, concerns over the availability of this data as well as the potential of vendor lock-in lead more users to adopt the multi-cloud approach. This study investigates the viability of this approach in data lake use cases. Results that a multi-cloud data lake can potentially be implemented with less than 1% performance impact to query run times at the cost of a 300% increase in one-time loading. This opens the door for future work on more algorithms and implementations that leverage multi-cloud deployments to enhance availability, scalability, and cost optimization.
Recommended Citation
Saavedra, M.Z.N.L., Yu, W.E.S. (2023). Measuring the Performance of An Object-Based Multi-cloud Data Lake. In: Yang, XS., Sherratt, R.S., Dey, N., Joshi, A. (eds) Proceedings of Eighth International Congress on Information and Communication Technology. ICICT 2023. Lecture Notes in Networks and Systems, vol 693. Springer, Singapore. https://doi.org/10.1007/978-981-99-3243-6_4