首页|基于双视角建模的多智能体协作强化学习方法

基于双视角建模的多智能体协作强化学习方法

扫码查看
在多智能体协作领域,强化学习算法通过共享智能体的局部信息来实现智能体间的协作.但共享协作机制极易引发过度协作问题,导致智能体忽视自身局部观测信息,丧失策略多样性,最终陷入低效协作的困境.为了解决该问题,本文提出基于双视角建模的多智能体协作强化学习方法(Bi-View Modeling Collaborative Multi-Agent Reinforcement Learning,简称BVM-CMARL).该方法从局部和全局两个视角对智能体进行建模,分别用于产生多样性的策略和激励协作.在局部视角最大化局部变分与自身轨迹的互信息,激励智能体的策略多样性;同时在全局视角最大化全局变分与其他智能体动作的互信息,提高智能体协作水平.最后将局部变分训练出的局部Q值与全局变分训练出的全局Q值合并,避免低效协作.将BVM-CMARL算法应用于星际争霸多智能体挑战赛(StarCraft Multi-Agent Challenge,SMAC)中的等级觅食(Level-Based Foraging,LBF)和走廊(Hallway)等环境,与 QMIX、QPLEX、RODE、EOI和MAVEN等5种目前优秀的强化学习算法相比,BVM-CMARL算法具有更好的稳定性和性能表现,在SMAC上的平均胜率为82.81%,比次优算法RODE高13.42%.通过设计模型变体,在消融实验中证明了双视角建模对BVM-CMARL的必要性.
Multi-Agent Collaborative Reinforcement Learning Method Based on Bi-View Modeling
In recent years,there have been notable advancements in artificial intelligence technology,solidifying its crucial role in a wide array of real-world applications.Among the branches of artificial intelligence,reinforcement learning shines as a key discipline adept at tackling complex sequential decision-making challenges and playing a vital role in tasks related to control.By harnessing the progress made in neural network theory and computational power,deep reinforcement learning has revolutionized conventional reinforcement learning algorithms,smoothly integrating deep learning techniques into the decision-making frameworks of agents.For instance,Deep Q-Learning(DQN)is a prime illustration of this progress,employing a convolutional neural network to analyze visual inputs from Atari 2600 games and subsequently adjusting the policy of the reinforcement learning algorithm.Complex deep reinforcement learning tasks often entail multiple agents and are consequently formulated as multi-agent reinforcement learning,a framework that has demonstrated remarkable success across various domains,such as traffic control,sensor networks,gaming AI.In multi-agent reinforcement learning,agents can learn to collaborate through the Centralized Training with Decentralized Execution(CTDE)mechanism.In CTDE mechanism,reinforcement learning algorithms are able to realize cooperative behavior between agents through the sharing of local information between them as part of the cooperation process.As a result of this shared cooperation mechanism,complex multi-agent tasks can be solved in many fields,but the problem that arises at the same time is that excessive cooperation between the agents can lead to a conflict.There is a consequence of this in that agents begin to overlook the use of their current local observation information in cooperative efforts,losing the diversity of policy options,and eventually becoming inefficiently collaborating.Aiming at this problem,we propose a Bi-View Modeling Collaborative Multi-Agent Reinforcement Learning(BVM-CMARL)method.The method models agents from both local and global perspectives for generating a diversity of strategies and incentivizing collaboration,respectively.In the local view,the mutual information between local variation and its own trajectory is maximized,and then the agent's policy diversity is stimulated.The enhancement in agents'collaboration level is attributed to mutual information among their actions.Subsequently,a fusion of the locally trained Q value derived from local variations and the globally trained Q value derived from global variables is implemented to overcome the challenge posed by ineffective cooperation.The BVM-CMARL algorithm along with four distinguished multi-agent reinforcement learning algorithms are deployed across a spectrum of environments including the StarCraft Multi-Agent Challenge(SMAC),Level-Based Foraging(LBF),and Hallway scenarios to evaluate their efficacy and performance.The experimental findings demonstrate that the BVM-CMARL algorithm exhibits superior stability and performance in comparison to four state-of-the-art reinforcement learning algorithms,namely QMIX,QPLEX,RODE,EOI,and MAVEN.The average success rate achieved on the StarCraft Multi-Agent Challenge(SMAC)stands at 82.81%,showcasing a significant 13.42%improvement over the suboptimal algorithm RODE.Furthermore,the robustness and effectiveness of bi-view modeling are verified by ablation experiments and hyperparameter sensitivity experiments.In addition,a visualization analysis was developed and used to intuitively illustrate the role of BVM-CMARL.

deep reinforcement learningmulti-agent systemmulti-agent collaborationcollaborative modelingcontrastive learning

刘全、施眉龙、黄志刚、张立华

展开 >

苏州大学计算机科学与技术学院 江苏苏州 215006

苏州大学江苏省计算机信息处理技术重点实验室 江苏苏州 215006

深度强化学习 多智能体系统 多智能体协作 协作建模 对比学习

国家自然科学基金国家自然科学基金新疆维吾尔自治区自然科学基金江苏高校优势学科建设工程资助项目资助

62376179621761752022D01A238

2024

计算机学报
中国计算机学会 中国科学院计算技术研究所

计算机学报

CSTPCD北大核心
影响因子:3.18
ISSN:0254-4164
年,卷(期):2024.47(7)