BiTransformer Memory for Multi-agent Reinforcement Learning
Multi-agent collaboration plays a crucial role in the field of reinforcement learning,focusing on how agents cooperate to achieve common goals.Most collaborative multi-agent algorithms emphasize the construction of collaboration but overlook the reinforcement of individual decision-making.To address this issue,this study proposes an online reinforcement learning model,BiTransformer memory(BTM),which not only considers the collaboration among multiple agents but also uses a memory module to assist individual decision-making.The BTM model is composed of a BiTransformer encoder and a BiTransformer decoder,which are utilized to improve individual decision-making and collaboration within the multi-agent system,respectively.Inspired by human reliance on historical decision-making experience,the BiTransformer encoder introduces a memory attention module to aid current decisions with a library of explicit historical decision-making experience rather than hidden units,differing from the conventional RNN-based method.Additionally,an attention fusion module is proposed to process partial observations with the assistance of historical decision experience,to obtain the most valuable information for decision-making from the environment,thereby enhancing the decision-making capabilities of individual agents.In the BiTransformer decoder,two modules are proposed:a decision attention module and a collaborative attention module.They are used to foster potential cooperation among agents by considering the collaborative benefits between other decision-making agents and the current agent,as well as partial observations with historical decision-making experience.BTM is tested in multiple scenes of StarCraft,achieving an average win rate of 93%.