Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics

doi:10.48550/arXiv.2211.15411

Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics

Soffair, Nitsan

WQMIX, QMIX, QTRAN, and VDN are SOTA algorithms for Dec-POMDP. All of them cannot solve complex agents' cooperation domains. We give an algorithm to solve such problems. In the first stage, we solve a single-agent problem and get a policy. In the second stage, we solve the multi-agent problem with the single-agent policy. SA2MA has a clear advantage over all competitors in complex agents' cooperative domains.

Publication:

arXiv e-prints

Pub Date:

November 2022

DOI:

10.48550/arXiv.2211.15411

arXiv:

arXiv:2211.15411

Bibcode:

2022arXiv221115411S

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Multiagent Systems

E-Print:

Paper has been not finished

ADS

Solving Collaborative Dec-POMDPs with Deep Reinforcement Learning Heuristics

Abstract