Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

doi:10.48550/arXiv.1807.02297

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge. We propose a multi-armed bandit framework for matching incentives to users, whose preferences are unknown a priori and evolving dynamically in time, in a resource constrained environment. We design an algorithm that combines ideas from three distinct domains: (i) a greedy matching paradigm, (ii) the upper confidence bound algorithm (UCB) for bandits, and (iii) mixing times from the theory of Markov chains. For this algorithm, we provide theoretical bounds on the regret and demonstrate its performance via both synthetic and realistic (matching supply and demand in a bike-sharing platform) examples.

Publication:

arXiv e-prints

Pub Date:

July 2018

DOI:

10.48550/arXiv.1807.02297

arXiv:

arXiv:1807.02297

Bibcode:

2018arXiv180702297F

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Systems and Control;
Statistics - Machine Learning

E-Print:

Published as a conference paper in Conference on Uncertainty in Artificial Intelligence (UAI) 2018

NASA/ADS

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

Abstract