Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach
Abstract
In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as `good' as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2020
- DOI:
- 10.48550/arXiv.2006.09997
- arXiv:
- arXiv:2006.09997
- Bibcode:
- 2020arXiv200609997V
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Networking and Internet Architecture;
- Statistics - Machine Learning
- E-Print:
- Accepted to INFOCOM 2020