Heterogeneous Multi-Agent Reinforcement Learning Method Based on Value Function Decomposition and Communication Learning Mechanism
Many real-world systems can be modeled as multi-agent systems in which multiple agents interact with the environment to learn and make decisions.Reinforcement learning has received wide attention recently and has achieved remarkable success in various fields.As practical tasks usually involve multiple agents interacting with the environment,multi-agent reinforcement learning has gradually become a research focus.Multi-agent reinforcement learning provides an effective way to develop these multi-agent systems and has achieved remarkable results in various complex sequential decision-making tasks.However,multi-agent reinforcement learning faces many challenges such as non-stationarity and dimensional curse.The value function decomposition method is one of the most popular MARL methods.By decomposing the global value function into the local individual value function,the value function decomposition method reduces the dimension of the action space to a great extent and alleviates the dimensional curse problem.In addition,agents can select actions only according to individual value functions,which solves the non-stationarity problem caused by the interaction between agents.Value function decomposition method based on centralized training and decentralized execution paradigm has been widely studied.However,the existing value decomposition methods generally lack communication mechanisms and perform poorly when dealing with multi-agent tasks requiring communication learning.At the same time,most of the current communication learning mechanisms are designed for homogeneous multi-agent environments,without considering heterogeneous multi-agent scenarios.In heterogeneous scenarios,information sharing between agents is not direct because of the heterogeneity of the agent's action space or observation space.If the heterogeneity of agents cannot be modeled effectively,the communication mechanism will become ineffective and even affect the performance of multi-agent cooperation.To address these challenges,this paper proposes a heterogeneous multi-agent rein-forcement learning framework that integrates value function decomposition and communication learning mechanisms.Specifically,(1)Different from the method using the homogeneous graph convolutional network,the framework utilizes the heterogeneous graph convolutional network to integrate the heterogeneous feature information of the agent to get effective embedding.(2)The embedding information and local observation history obtained by the communication learning module are used to calculate the action value of each agent to select and coordinate the actions of the agents.(3)Through the joint training of loss function of mutual information and value function decomposition,the proposed method can be effectively trained.The proposed method maintains the advantages of scalability and stability of value function decomposition and promotes better collaboration and decision-making of agents by utilizing diverse information interactions between heterogeneous agents.To the best of our knowledge,our work is the first attempt to combine the communication learning method based on graph convolution network and the value function learning method to develop the heterogeneous multi-agent system.The proposed frame-work provides a new idea for the field of heterogeneous multi-agent reinforcement learning.This paper first conducts experiments on two heterogeneous multi-agent platforms,and the experi-mental results show that the proposed method can learn more effective strategies than the baseline method,and the average reward value and average win rate of 13%and 24%respectively on the two platforms compared with the baseline method.In addition,the feasibility of this method in the real system is verified in the traffic signal control scenario.
value function decompositionheterogeneous multi-agent reinforcement learningcommunication mechanismgraph neural networkmutual informationtraffic signal control