基于价值函数分解和通信学习机制的异构多智能体强化学习方法

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：许多现实世界的系统可以被建模为多智能体系统,多智能体强化学习为开发这些系统提供了一种有效的方法,其中基于集中训练与分散执行范式的价值函数分解方法得到了广泛的研究.然而现有的价值分解方法一般缺乏通信机制,在处理需要通信学习的多智能体任务时表现不佳.同时,目前大多数通信机制都是针对同构多智能体环境设计的,没有考虑异构多智能体场景.在异构场景中,由于智能体动作空间或观测空间的异构性,智能体之间的信息共享并不直接.如果不能对智能体的异构性进行有效地建模处理,通信机制将变得无效,甚至会影响多智能体的协作性能.为了应对这些挑战,本文提出一个融合价值函数分解和通信学习机制的异构多智能体强化学习框架.具体地:(1)与采用同构图卷积网络的方法不同,该框架利用异构图卷积网络融合智能体的异构特征信息得到有效的嵌入;(2)利用通信学习模块获得的嵌入信息和局部观测历史计算每个智能体的动作价值,以选择和协调智能体的动作;(3)通过设计的互信息损失函数和价值函数分解模块的损失函数联合训练,能够有效地训练整个方法.本文首先在两个异构多智能体平台上进行实验,实验结果表明该方法能学到比基线方法更有效的策略,在两个平台上相比基线方法分别提高了 13％的平均奖励值和24％的平均胜率.此外,在交通信号控制场景中验证了该方法在现实系统中的可行性.

外文标题：Heterogeneous Multi-Agent Reinforcement Learning Method Based on Value Function Decomposition and Communication Learning Mechanism

外文摘要：Many real-world systems can be modeled as multi-agent systems in which multiple agents interact with the environment to learn and make decisions.Reinforcement learning has received wide attention recently and has achieved remarkable success in various fields.As practical tasks usually involve multiple agents interacting with the environment,multi-agent reinforcement learning has gradually become a research focus.Multi-agent reinforcement learning provides an effective way to develop these multi-agent systems and has achieved remarkable results in various complex sequential decision-making tasks.However,multi-agent reinforcement learning faces many challenges such as non-stationarity and dimensional curse.The value function decomposition method is one of the most popular MARL methods.By decomposing the global value function into the local individual value function,the value function decomposition method reduces the dimension of the action space to a great extent and alleviates the dimensional curse problem.In addition,agents can select actions only according to individual value functions,which solves the non-stationarity problem caused by the interaction between agents.Value function decomposition method based on centralized training and decentralized execution paradigm has been widely studied.However,the existing value decomposition methods generally lack communication mechanisms and perform poorly when dealing with multi-agent tasks requiring communication learning.At the same time,most of the current communication learning mechanisms are designed for homogeneous multi-agent environments,without considering heterogeneous multi-agent scenarios.In heterogeneous scenarios,information sharing between agents is not direct because of the heterogeneity of the agent's action space or observation space.If the heterogeneity of agents cannot be modeled effectively,the communication mechanism will become ineffective and even affect the performance of multi-agent cooperation.To address these challenges,this paper proposes a heterogeneous multi-agent rein-forcement learning framework that integrates value function decomposition and communication learning mechanisms.Specifically,(1)Different from the method using the homogeneous graph convolutional network,the framework utilizes the heterogeneous graph convolutional network to integrate the heterogeneous feature information of the agent to get effective embedding.(2)The embedding information and local observation history obtained by the communication learning module are used to calculate the action value of each agent to select and coordinate the actions of the agents.(3)Through the joint training of loss function of mutual information and value function decomposition,the proposed method can be effectively trained.The proposed method maintains the advantages of scalability and stability of value function decomposition and promotes better collaboration and decision-making of agents by utilizing diverse information interactions between heterogeneous agents.To the best of our knowledge,our work is the first attempt to combine the communication learning method based on graph convolution network and the value function learning method to develop the heterogeneous multi-agent system.The proposed frame-work provides a new idea for the field of heterogeneous multi-agent reinforcement learning.This paper first conducts experiments on two heterogeneous multi-agent platforms,and the experi-mental results show that the proposed method can learn more effective strategies than the baseline method,and the average reward value and average win rate of 13％and 24％respectively on the two platforms compared with the baseline method.In addition,the feasibility of this method in the real system is verified in the traffic signal control scenario.

外文关键词：

value function decompositionheterogeneous multi-agent reinforcement learningcommunication mechanismgraph neural networkmutual informationtraffic signal control

作者：

杜威、丁世飞、郭丽丽、张健、丁玲

展开 >

作者单位：

中国矿业大学计算机科学与技术学院江苏徐州 221116

矿山数字化教育部工程研究中心(中国矿业大学) 江苏徐州 221116

天津大学智能与计算学部天津 300350

关键词：

价值函数分解异构多智能体强化学习通信机制图神经网络互信息交通信号控制

基金：

国家自然科学基金国家自然科学基金

项目编号：

6227626561976216

出版年：

2024

DOI：

10.11897/SP.J.1016.2024.01304

计算机学报

中国计算机学会中国科学院计算技术研究所

计算机学报

CSTPCD北大核心

影响因子：3.18

ISSN：0254-4164

年,卷(期)：2024.47(6)

参考文献量2