Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach

doi:10.48550/arXiv.2006.09997

Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach

In this paper, we study a novel Stochastic Network Utility Maximization (NUM) problem where the utilities of agents are unknown. The utility of each agent depends on the amount of resource it receives from a network operator/controller. The operator desires to do a resource allocation that maximizes the expected total utility of the network. We consider threshold type utility functions where each agent gets non-zero utility if the amount of resource it receives is higher than a certain threshold. Otherwise, its utility is zero (hard real-time). We pose this NUM setup with unknown utilities as a regret minimization problem. Our goal is to identify a policy that performs as `good' as an oracle policy that knows the utilities of agents. We model this problem setting as a bandit setting where feedback obtained in each round depends on the resource allocated to the agents. We propose algorithms for this novel setting using ideas from Multiple-Play Multi-Armed Bandits and Combinatorial Semi-Bandits. We show that the proposed algorithm is optimal when all agents have the same utility. We validate the performance guarantees of our proposed algorithms through numerical experiments.

Publication:

arXiv e-prints

Pub Date:

June 2020

DOI:

10.48550/arXiv.2006.09997

arXiv:

arXiv:2006.09997

Bibcode:

2020arXiv200609997V

Keywords:

Computer Science - Machine Learning;
Computer Science - Networking and Internet Architecture;
Statistics - Machine Learning

E-Print:

Accepted to INFOCOM 2020

NASA/ADS

Stochastic Network Utility Maximization with Unknown Utilities: Multi-Armed Bandits Approach

Abstract