DataPooling in Stochastic Optimization
Abstract
Managing largescale systems often involves simultaneously solving thousands of unrelated stochastic optimization problems, each with limited data. Intuition suggests one can decouple these unrelated problems and solve them separately without loss of generality. We propose a novel datapooling algorithm called ShrunkenSAA that disproves this intuition. In particular, we prove that combining data across problems can outperform decoupling, even when there is no a priori structure linking the problems and data are drawn independently. Our approach does not require strong distributional assumptions and applies to constrained, possibly nonconvex, nonsmooth optimization problems such as vehiclerouting, economic lotsizing or facility location. We compare and contrast our results to a similar phenomenon in statistics (Stein's Phenomenon), highlighting unique features that arise in the optimization setting that are not present in estimation. We further prove that as the number of problems grows large, ShrunkenSAA learns if pooling can improve upon decoupling and the optimal amount to pool, even if the average amount of data per problem is fixed and bounded. Importantly, we highlight a simple intuition based on stability that highlights when and why datapooling offers a benefit, elucidating this perhaps surprising phenomenon. This intuition further suggests that datapooling offers the most benefits when there are many problems, each of which has a small amount of relevant data. Finally, we demonstrate the practical benefits of datapooling using real data from a chain of retail drug stores in the context of inventory management.
 Publication:

arXiv eprints
 Pub Date:
 June 2019
 DOI:
 10.48550/arXiv.1906.00255
 arXiv:
 arXiv:1906.00255
 Bibcode:
 2019arXiv190600255G
 Keywords:

 Mathematics  Optimization and Control;
 Computer Science  Machine Learning;
 Mathematics  Statistics Theory