首页|基于图神经网络的多智能体强化学习值函数分解方法

基于图神经网络的多智能体强化学习值函数分解方法

扫码查看
如何在部分可观测的情况下实现智能体之间的协同配合是多智能体强化学习(MARL)中的一个重要问题。值函数分解方法解决了信用分配问题,是一种实现多智能体之间协同配合的有效方法,然而在现有的值函数分解方法中,智能体个体动作值函数仅取决于局部信息,不允许智能体之间进行显式的信息交换,阻碍了这一系列算法的性能,使其无法适用于复杂场景。为了解决这一问题,在值函数分解方法中引入智能体间的通信,为智能体提供有效的非局部信息以帮助其理解复杂环境。在此基础上,提出一个基于图神经网络的分层通信模型,通过图神经网络提取相邻智能体之间需要交换的有用信息,同时模型能够实现从非通信向充分通信过渡,在通信范围有限的情况下实现全局合作,适用于现实世界中通信范围受约束的情况。在星际争霸Ⅱ多智能体挑战赛(SMAC)环境和捕食者-猎物(PP)环境下进行实验,结果表明,在SMAC的4个不同场景下,该方法与QMIX、VBC等基线算法相比平均胜率提升2~40个百分点,并且能够有效解决非单调环境下的捕食者-猎物问题。
Multi-Agent Reinforcement Learning Value Function Factorization Approach Based on Graph Neural Network
Collaborative cooperation between agents in partially observable situations is an important problem in Multi-Agent Reinforcement Learning(MARL).The value function factorization approach solves the credit assignment problem and effectively achieves collaborative cooperation between agents.However,existing value function factorization approaches depend only on individual value functions with local information and do not allow explicit information exchange between agents,making them unsuitable for complex scenarios.To address this problem,this study introduces communication in the value function factorization approach to provide effective nonlocal information to agents,helping them understand complex environments.Furthermore,unlike existing communication approaches,the proposed approach uses a multi-layer message passing architecture based on Graph Neural Network(GNN),which extracts useful information that must be exchanged between neighboring agents.Simultaneously,the model realizes the transition from non-communication to full communication and achieves global cooperation with a limited communication range,which is suitable for real-world applications where the communication range is constrained.The results of experiments in the StarCraft Ⅱ Multi-Agent Challenge(SMAC)and Predator-Prey(PP)environments demonstrate that the average winning rate of this approach improves by 2-40 percentage points compared with those of baseline algorithms,such as QMIX and VBC,in four different scenarios of SMAC.Furthermore,the proposed approach effectively solves the PP problem in non-monotonic environments.

deep reinforcement learningmulti-agent environmentagent cooperationagent communicationGraph Neural Network(GNN)

孙文洁、李宗民、孙浩淼

展开 >

中国石油大学(华东)计算机科学与技术学院,山东青岛 266580

山东石油化工学院大数据与基础科学学院,山东东营 257061

深度强化学习 多智能体环境 智能体协同 智能体通信 图神经网络

国家重点研发计划国家自然科学基金山东省自然科学基金山东省自然科学基金

2019YFF030180061379106ZR2013FM036ZR2015FM011

2024

计算机工程
华东计算技术研究所 上海市计算机学会

计算机工程

CSTPCD北大核心
影响因子:0.581
ISSN:1000-3428
年,卷(期):2024.50(5)
  • 25