Cooperative Multi-Agent Constrained POMDPs: Strong Duality and Primal-Dual Reinforcement Learning with Approximate Information States
Abstract
We study the problem of decentralized constrained POMDPs in a team-setting where the multiple non-strategic agents have asymmetric information. Strong duality is established for the setting of infinite-horizon expected total discounted costs when the observations lie in a countable space, the actions are chosen from a finite space, and the immediate cost functions are bounded. Following this, connections with the common-information and approximate information-state approaches are established. The approximate information-states are characterized independent of the Lagrange-multipliers vector so that adaptations of the multiplier (during learning) will not necessitate new representations. Finally, a primal-dual multi-agent reinforcement learning (MARL) framework based on centralized training distributed execution (CTDE) and three time-scale stochastic approximation is developed with the aid of recurrent and feedforward neural-networks as function-approximators.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2023
- DOI:
- arXiv:
- arXiv:2307.16536
- Bibcode:
- 2023arXiv230716536K
- Keywords:
-
- Mathematics - Optimization and Control;
- Electrical Engineering and Systems Science - Systems and Control
- E-Print:
- arXiv admin note: substantial text overlap with arXiv:2303.14932