Reinforcement Learning With Reward Machines in Stochastic Games
Abstract
We investigate multiagent reinforcement learning for stochastic games with complex tasks, where the reward functions are nonMarkovian. We utilize reward machines to incorporate highlevel knowledge of complex tasks. We develop an algorithm called Qlearning with reward machines for stochastic games (QRMSG), to learn the bestresponse strategy at Nash equilibrium for each agent. In QRMSG, we define the Qfunction at a Nash equilibrium in augmented state space. The augmented state space integrates the state of the stochastic game and the state of reward machines. Each agent learns the Qfunctions of all agents in the system. We prove that Qfunctions learned in QRMSG converge to the Qfunctions at a Nash equilibrium if the stage game at each time step during learning has a global optimum point or a saddle point, and the agents update Qfunctions based on the bestresponse strategy at this point. We use the LemkeHowson method to derive the bestresponse strategy given current Qfunctions. The three case studies show that QRMSG can learn the bestresponse strategies effectively. QRMSG learns the bestresponse strategies after around 7500 episodes in Case Study I, 1000 episodes in Case Study II, and 1500 episodes in Case Study III, while baseline methods such as Nash Qlearning and MADDPG fail to converge to the Nash equilibrium in all three case studies.
 Publication:

arXiv eprints
 Pub Date:
 May 2023
 DOI:
 10.48550/arXiv.2305.17372
 arXiv:
 arXiv:2305.17372
 Bibcode:
 2023arXiv230517372H
 Keywords:

 Computer Science  Multiagent Systems;
 Computer Science  Artificial Intelligence;
 Computer Science  Computer Science and Game Theory;
 Computer Science  Machine Learning