Data-driven inventory management for new products: A warm-start and adjusted Dyna-$Q$ approach
Abstract
In this paper, we propose a novel reinforcement learning algorithm for inventory management of newly launched products with no or limited historical demand information. The algorithm follows the classic Dyna-$Q$ structure, balancing the model-based and model-free approaches, while accelerating the training process of Dyna-$Q$ and mitigating the model discrepancy generated by the model-based feedback. Warm-start information from the demand data of existing similar products can be incorporated into the algorithm to further stabilize the early-stage training and reduce the variance of the estimated optimal policy. Our approach is validated through a case study of bakery inventory management with real data. The adjusted Dyna-$Q$ shows up to a 23.7\% reduction in average daily cost compared with $Q$-learning, and up to a 77.5\% reduction in training time within the same horizon compared with classic Dyna-$Q$. By incorporating the warm-start information, it can be found that the adjusted Dyna-$Q$ has the lowest total cost, lowest variance in total cost, and relatively low shortage percentages among all the algorithms under a 30-day testing.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- arXiv:
- arXiv:2501.08109
- Bibcode:
- 2025arXiv250108109Q
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Computer Science - Computational Engineering, Finance, and Science
- E-Print:
- 7 pages, 2 figures