AgentMixer: Multi-Agent Correlated Policy Factorization

doi:10.48550/arXiv.2401.08728

AgentMixer: Multi-Agent Correlated Policy Factorization

Centralized training with decentralized execution (CTDE) is widely employed to stabilize partially observable multi-agent reinforcement learning (MARL) by utilizing a centralized value function during training. However, existing methods typically assume that agents make decisions based on their local observations independently, which may not lead to a correlated joint policy with sufficient coordination. Inspired by the concept of correlated equilibrium, we propose to introduce a \textit{strategy modification} to provide a mechanism for agents to correlate their policies. Specifically, we present a novel framework, AgentMixer, which constructs the joint fully observable policy as a non-linear combination of individual partially observable policies. To enable decentralized execution, one can derive individual policies by imitating the joint policy. Unfortunately, such imitation learning can lead to \textit{asymmetric learning failure} caused by the mismatch between joint policy and individual policy information. To mitigate this issue, we jointly train the joint policy and individual policies and introduce \textit{Individual-Global-Consistency} to guarantee mode consistency between the centralized and decentralized policies. We then theoretically prove that AgentMixer converges to an $\epsilon$-approximate Correlated Equilibrium. The strong experimental performance on three MARL benchmarks demonstrates the effectiveness of our method.

Publication:

arXiv e-prints

Pub Date:

January 2024

DOI:

10.48550/arXiv.2401.08728

arXiv:

arXiv:2401.08728

Bibcode:

2024arXiv240108728L

Keywords:

Computer Science - Multiagent Systems;
Computer Science - Artificial Intelligence

NASA/ADS

AgentMixer: Multi-Agent Correlated Policy Factorization

Abstract