BudgetConstrained Bandits over General Cost and Reward Distributions
Abstract
We consider a budgetconstrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected reward under a budget constraint on the total cost. The model is general in the sense that it allows correlated and potentially heavytailed costreward pairs that can take on negative values as required by many applications. We show that if moments of order $(2+\gamma)$ for some $\gamma > 0$ exist for all costreward pairs, $O(\log B)$ regret is achievable for a budget $B>0$. In order to achieve tight regret bounds, we propose algorithms that exploit the correlation between the cost and reward of each arm by extracting the common information via linear minimum meansquare error estimation. We prove a regret lower bound for this problem, and show that the proposed algorithms achieve tight problemdependent regret bounds, which are optimal up to a universal constant factor in the case of jointly Gaussian cost and reward pairs.
 Publication:

arXiv eprints
 Pub Date:
 February 2020
 arXiv:
 arXiv:2003.00365
 Bibcode:
 2020arXiv200300365C
 Keywords:

 Computer Science  Machine Learning;
 Statistics  Machine Learning