Any-Play: An Intrinsic Augmentation for Zero-Shot Coordination
Abstract
Cooperative artificial intelligence with human or superhuman proficiency in collaborative tasks stands at the frontier of machine learning research. Prior work has tended to evaluate cooperative AI performance under the restrictive paradigms of self-play (teams composed of agents trained together) and cross-play (teams of agents trained independently but using the same algorithm). Recent work has indicated that AI optimized for these narrow settings may make for undesirable collaborators in the real-world. We formalize an alternative criteria for evaluating cooperative AI, referred to as inter-algorithm cross-play, where agents are evaluated on teaming performance with all other agents within an experiment pool with no assumption of algorithmic similarities between agents. We show that existing state-of-the-art cooperative AI algorithms, such as Other-Play and Off-Belief Learning, under-perform in this paradigm. We propose the Any-Play learning augmentation -- a multi-agent extension of diversity-based intrinsic rewards for zero-shot coordination (ZSC) -- for generalizing self-play-based algorithms to the inter-algorithm cross-play setting. We apply the Any-Play learning augmentation to the Simplified Action Decoder (SAD) and demonstrate state-of-the-art performance in the collaborative card game Hanabi.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2022
- DOI:
- 10.48550/arXiv.2201.12436
- arXiv:
- arXiv:2201.12436
- Bibcode:
- 2022arXiv220112436L
- Keywords:
-
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning;
- Computer Science - Multiagent Systems;
- I.2.11
- E-Print:
- Accepted to AAMAS 2022. Code will be made available at https://github.com/mit-ll/hanabi_AnyPlay (may take several weeks after posting of this pre-print)