Skyrise: Exploiting Serverless Cloud Infrastructure for Elastic Data Processing
Abstract
Serverless computing offers elasticity unmatched by conventional server-based cloud infrastructure. Although modern data processing systems embrace serverless storage, such as Amazon S3, they continue to manage their compute resources as servers. This is challenging for unpredictable workloads, leaving clusters often underutilized. Recent research shows the potential of serverless compute resources, such as cloud functions, for elastic data processing, but also sees limitations in performance robustness and cost efficiency for long running workloads. These challenges require holistic approaches across the system stack. However, to the best of our knowledge, there is no end-to-end data processing system built entirely on serverless infrastructure. In this paper, we present Skyrise, our effort towards building the first fully serverless SQL query processor. Skyrise exploits the elasticity of its underlying infrastructure, while alleviating the inherent limitations with a number of adaptive and cost-aware techniques. We show that both Skyrise's performance and cost are competitive to other cloud data systems for terabyte-scale queries of the analytical TPC-H benchmark.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.08479
- Bibcode:
- 2025arXiv250108479B
- Keywords:
-
- Computer Science - Databases