No DNN Left Behind: Improving Inference in the Cloud with Multi-Tenancy
Abstract
With the rise of machine learning, inference on deep neural networks (DNNs) has become a core building block on the critical path for many cloud applications. Applications today rely on isolated ad-hoc deployments that force users to compromise on consistent latency, elasticity, or cost-efficiency, depending on workload characteristics. We propose to elevate DNN inference to be a first class cloud primitive provided by a shared multi-tenant system, akin to cloud storage, and cloud databases. A shared system enables cost-efficient operation with consistent performance across the full spectrum of workloads. We argue that DNN inference is an ideal candidate for a multi-tenant system because of its narrow and well-defined interface and predictable resource requirements.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2019
- DOI:
- 10.48550/arXiv.1901.06887
- arXiv:
- arXiv:1901.06887
- Bibcode:
- 2019arXiv190106887S
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing
- E-Print:
- 5 pages