Risk-Averse Stochastic Convex Bandit

doi:10.48550/arXiv.1810.00737

Risk-Averse Stochastic Convex Bandit

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.

Publication:

arXiv e-prints

Pub Date:

October 2018

DOI:

10.48550/arXiv.1810.00737

arXiv:

arXiv:1810.00737

Bibcode:

2018arXiv181000737R

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

NASA/ADS

Risk-Averse Stochastic Convex Bandit

Abstract