Efficient Reinforcement Learning for Global Decision Making in the Presence of Local Agents at Scale
Abstract
We study reinforcement learning for global decision-making in the presence of local agents, where the global decision-maker makes decisions affecting all local agents, and the objective is to learn a policy that maximizes the joint rewards of all the agents. Such problems find many applications, e.g. demand response, EV charging, queueing, etc. In this setting, scalability has been a long-standing challenge due to the size of the state space which can be exponential in the number of agents. This work proposes the \texttt{SUBSAMPLE-Q} algorithm where the global agent subsamples $k\leq n$ local agents to compute a policy in time that is polynomial in $k$. We show that this learned policy converges to the optimal policy in the order of $\tilde{O}(1/\sqrt{k}+{\epsilon}_{k,m})$ as the number of sub-sampled agents $k$ increases, where ${\epsilon}_{k,m}$ is the Bellman noise. Finally, we validate the theory through numerical simulations in a demand-response setting and a queueing setting.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2024
- DOI:
- 10.48550/arXiv.2403.00222
- arXiv:
- arXiv:2403.00222
- Bibcode:
- 2024arXiv240300222A
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Multiagent Systems;
- I.2.6
- E-Print:
- 34 pages, 6 figures