Scaling-laws for Large Time-series Models
Abstract
Scaling laws for large language models (LLMs) have provided useful guidance on how to train ever larger models for predictable performance gains. Time series forecasting shares a similar sequential structure to language, and is amenable to large-scale transformer architectures. Here we show that foundational decoder-only time series transformer models exhibit analogous scaling-behavior to LLMs, while architectural details (aspect ratio and number of heads) have a minimal effect over broad ranges. We assemble a large corpus of heterogenous time series data on which to train, and establish, for the first time, power-law scaling relations with respect to parameter count, dataset size, and training compute, spanning five orders of magnitude.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2024
- DOI:
- 10.48550/arXiv.2405.13867
- arXiv:
- arXiv:2405.13867
- Bibcode:
- 2024arXiv240513867E
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence
- E-Print:
- 8 pages, 3 figures