Multi-Agent Collaborative Reinforcement Learning Method Based on Bi-View Modeling
In recent years,there have been notable advancements in artificial intelligence technology,solidifying its crucial role in a wide array of real-world applications.Among the branches of artificial intelligence,reinforcement learning shines as a key discipline adept at tackling complex sequential decision-making challenges and playing a vital role in tasks related to control.By harnessing the progress made in neural network theory and computational power,deep reinforcement learning has revolutionized conventional reinforcement learning algorithms,smoothly integrating deep learning techniques into the decision-making frameworks of agents.For instance,Deep Q-Learning(DQN)is a prime illustration of this progress,employing a convolutional neural network to analyze visual inputs from Atari 2600 games and subsequently adjusting the policy of the reinforcement learning algorithm.Complex deep reinforcement learning tasks often entail multiple agents and are consequently formulated as multi-agent reinforcement learning,a framework that has demonstrated remarkable success across various domains,such as traffic control,sensor networks,gaming AI.In multi-agent reinforcement learning,agents can learn to collaborate through the Centralized Training with Decentralized Execution(CTDE)mechanism.In CTDE mechanism,reinforcement learning algorithms are able to realize cooperative behavior between agents through the sharing of local information between them as part of the cooperation process.As a result of this shared cooperation mechanism,complex multi-agent tasks can be solved in many fields,but the problem that arises at the same time is that excessive cooperation between the agents can lead to a conflict.There is a consequence of this in that agents begin to overlook the use of their current local observation information in cooperative efforts,losing the diversity of policy options,and eventually becoming inefficiently collaborating.Aiming at this problem,we propose a Bi-View Modeling Collaborative Multi-Agent Reinforcement Learning(BVM-CMARL)method.The method models agents from both local and global perspectives for generating a diversity of strategies and incentivizing collaboration,respectively.In the local view,the mutual information between local variation and its own trajectory is maximized,and then the agent's policy diversity is stimulated.The enhancement in agents'collaboration level is attributed to mutual information among their actions.Subsequently,a fusion of the locally trained Q value derived from local variations and the globally trained Q value derived from global variables is implemented to overcome the challenge posed by ineffective cooperation.The BVM-CMARL algorithm along with four distinguished multi-agent reinforcement learning algorithms are deployed across a spectrum of environments including the StarCraft Multi-Agent Challenge(SMAC),Level-Based Foraging(LBF),and Hallway scenarios to evaluate their efficacy and performance.The experimental findings demonstrate that the BVM-CMARL algorithm exhibits superior stability and performance in comparison to four state-of-the-art reinforcement learning algorithms,namely QMIX,QPLEX,RODE,EOI,and MAVEN.The average success rate achieved on the StarCraft Multi-Agent Challenge(SMAC)stands at 82.81%,showcasing a significant 13.42%improvement over the suboptimal algorithm RODE.Furthermore,the robustness and effectiveness of bi-view modeling are verified by ablation experiments and hyperparameter sensitivity experiments.In addition,a visualization analysis was developed and used to intuitively illustrate the role of BVM-CMARL.
deep reinforcement learningmulti-agent systemmulti-agent collaborationcollaborative modelingcontrastive learning