Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics
Abstract
WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperative domains.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2022
- DOI:
- arXiv:
- arXiv:2211.15411
- Bibcode:
- 2022arXiv221115411S
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Computer Science - Multiagent Systems
- E-Print:
- Paper has been not finished