Optimal Transcoding Resolution Prediction for Efficient Per-Title Bitrate Ladder Estimation
Abstract
Adaptive video streaming requires efficient bitrate ladder construction to meet heterogeneous network conditions and end-user demands. Per-title optimized encoding typically traverses numerous encoding parameters to search the Pareto-optimal operating points for each video. Recently, researchers have attempted to predict the content-optimized bitrate ladder for pre-encoding overhead reduction. However, existing methods commonly estimate the encoding parameters on the Pareto front and still require subsequent pre-encodings. In this paper, we propose to directly predict the optimal transcoding resolution at each preset bitrate for efficient bitrate ladder construction. We adopt a Temporal Attentive Gated Recurrent Network to capture spatial-temporal features and predict transcoding resolutions as a multi-task classification problem. We demonstrate that content-optimized bitrate ladders can thus be efficiently determined without any pre-encoding. Our method well approximates the ground-truth bitrate-resolution pairs with a slight Bjøntegaard Delta rate loss of 1.21% and significantly outperforms the state-of-the-art fixed ladder.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2024
- DOI:
- 10.48550/arXiv.2401.04405
- arXiv:
- arXiv:2401.04405
- Bibcode:
- 2024arXiv240104405Y
- Keywords:
-
- Computer Science - Multimedia;
- Computer Science - Artificial Intelligence;
- Computer Science - Computer Vision and Pattern Recognition;
- Electrical Engineering and Systems Science - Image and Video Processing
- E-Print:
- Accepted by the 2024 Data Compression Conference (DCC) for presentation as a poster. This is the full paper