深度强化学习(Deep Reinforcement Learning,DRL)在近年受到广泛的关注,并在各种领域取得显著的成功.由于现实环境通常包括多个与环境交互的智能体,多智能体深度强化学习(Multi-Agent Deep Reinforcement Learning,MADRL)获得蓬勃的发展,在各种复杂的序列决策任务上取得优异的表现.本文对多智能体深度强化学习的工作进展进行综述,主要内容分为三个部分.首先,我们回顾了几种常见的多智能体强化学习问题表示及其对应的合作、竞争和混合任务.其次,我们对目前的MADRL方法进行了全新的多维度的分类,并对不同类别的方法展开进一步介绍.其中,我们重点综述值函数分解方法,基于通信的MADRL方法以及基于图神经网络的MADRL方法.最后,我们研究了 MADRL方法在现实场景中的主要应用.希望本文能够为即将进入这一快速发展领域的新研究人员和希望获得全方位了解并根据最新进展确定新方向的现有领域专家提供帮助.
Research Progress of Multi-Agent Deep Reinforcement Learning
Reinforcement learning is a traditional machine learning method to solve complex decision-making problems.With the advent of the era of artificial intelligence,deep learning has achieved remarkable success thanks to the vast amount of data and the increase in computing power brought by hardware development.Deep reinforcement learning(DRL)has been widely paid attention in recent years and achieved remarkable success in various fields.Because the real environment usually includes multiple agents interacting with the environment,the multi-agent deep reinforcement learning(MADRL)has gained vigorous development and achieved excellent performance in a variety of complex sequential decision tasks.This paper summarizes the research progress of multi-agent deep reinforcement learning,which is divided into three parts.First,we review several common multi-agent reinforcement learning problem representations such as Markov games and partially observable Markov games and their corresponding cooperative,competitive,and mixed cooperative-competitive tasks.Second,we make a new multi-dimensional classification of the current MADRL method and further introduce the methods of different categories.Concretely,we divide MADRL into value-based function methods and policy-based methods according to different ways of solving optimal policies.Besides,we divide MADRL into cooperative tasks and general tasks(cooperative,competitive,or mixed task)according to applicable task types.In addition,we introduce a new dimension,that is,whether a communication mechanism is established between agents,dividing the MADRL into communication and non-communication methods.Based on the above three dimensions,the popular MADRL methods are divided into eight categories.Among them,we focus on the value function decomposition method,communication-based MADRL method,and graph neural network based MADRL method.Value function decomposition methods can be divided into simple factorization,IGM principle based,and others.Communication structures are divided into fully connected,star,tree,neighbor,and layered types.In addition,we study the main applications of MADRL methods in real-world scenarios such as autonomous driving,traffic signal control,and recommendation systems.The classification in this paper is based on several common types of MADRL problem representation and model-free MADRL methods,so there are many unfocused but promising directions,which we briefly analyze in section 5,including extensive game problems,model-based MADRL methods,and safe and robust MADRL.Finally,we give a summary of this paper.With the rapid development of deep learning methods,the MARL field is undergoing rapid change,and many previously unsolvable problems are gradually becoming easier to handle with MADRL methods.MADRL is a developing field,that attracts more interest from scholars,but also faces many challenges such as non-stationarity,dimensional curse,and credit assignment.Overall,DRL can improve the intelligence and efficiency of systems in various fields by learning optimal decision strategies,bringing tremendous impact and change to human society.In this paper,we provide a broad overview of the latest work in the emerging field of multi-agent deep reinforcement learning,starting from extended game theory,model-based MADRL,and secure and robust MADRL.We expect this paper will be helpful to new researchers entering this rapidly developing field and to existing field experts who want to gain a comprehensive understanding and determine new directions based on the latest advances.
multi-agent deep reinforcement learningvalue-basedpolicy-basedcommunication learninggraph neural network