Dynamic Adaptation in Data Storage: Real-Time Machine Learning for Enhanced Prefetching
Abstract
The exponential growth of data storage demands has necessitated the evolution of hierarchical storage management strategies [1]. This study explores the application of streaming machine learning [3] to revolutionize data prefetching within multi-tiered storage systems. Unlike traditional batch-trained models, streaming machine learning [5] offers adaptability, real-time insights, and computational efficiency, responding dynamically to workload variations. This work designs and validates an innovative framework that integrates streaming classification models for predicting file access patterns, specifically the next file offset. Leveraging comprehensive feature engineering and real-time evaluation over extensive production traces, the proposed methodology achieves substantial improvements in prediction accuracy, memory efficiency, and system adaptability. The results underscore the potential of streaming models in real-time storage management, setting a precedent for advanced caching and tiering strategies.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2501.14771
- Bibcode:
- 2025arXiv250114771C
- Keywords:
-
- Computer Science - Distributed;
- Parallel;
- and Cluster Computing;
- Computer Science - Machine Learning;
- Computer Science - Operating Systems
- E-Print:
- I uploaded the paper without obtaining consent from all the authors. One of the authors now refuses to publish this paper, as it has been demonstrated to be unreliable, contains significant flaws in prior research, and is missing proper citations in Sections 2 and 3