首页|基于DQN的多智能体深度强化学习运动规划方法

基于DQN的多智能体深度强化学习运动规划方法

扫码查看
DQN方法作为经典的基于价值的深度强化学习方法,在多智能体运动规划等领域得到了广泛应用.然而,DQN方法面临一系列挑战,例如,DQN会过高估计Q值,计算Q值较为复杂,神经网络没有历史记忆能力,使用e-greedy策略进行探索效率较低等.针对这些问题,提出了一种基于DQN的多智能体深度强化学习运动规划方法,该方法可以帮助智能体学习到高效稳定的运动规划策略,无碰撞地到达目标点.首先,在DQN方法的基础上,提出了基于Dueling的Q值计算优化机制,将Q值的计算方式改进为计算状态值和优势函数值,并根据当前正在更新的Q值网络的参数选择最优动作,使得Q值的计算更加简单准确;其次,提出了基于GRU的记忆机制,引入了 GRU模块,使得网络可以捕捉时序信息,具有处理智能体历史信息的能力;最后,提出了基于噪声的有效探索机制,通过引入参数化的噪声,改变了 DQN中的探索方式,提高了智能体的探索效率,使得多智能体系统达到探索-利用的平衡状态.在PyBullet仿真平台的6种不同的仿真场景中进行了测试,实验结果表明,所提方法可以使多智能体团队进行高效协作,无碰撞地到达各自 目标点,且策略训练过程稳定.
DQN-based Multi-agent Motion Planning Method with Deep Reinforcement Learning
DQN as a classical value-based deep reinforcement learning method,has been widely used in the field of multi-agent motion planning.However,there are a series of challenges in DQN,such as,DQN can overestimate Q values,calculating Q values is more complicated,neural networks have no historical memory capability,using e-greedy strategy for exploration is less effi-cient.To address these problems,a DQN-based multi-agent deep reinforcement learning motion planning method is proposed,which can help the agents learn an efficient and stable motion planning strategy,so as to reach the target points without collision.Firstly,based on the DQN method,an optimization mechanism for Q value calculation based on Dueling is proposed,which im-proves the calculation of Q value to calculate the state value and the advantage function value,and selects the optimal action based on the parameters of the Q value network that is currently being updated,making the calculation of Q value simpler and more ac-curate.Secondly,a memory mechanism based on GRU is proposed,and a GRU module is introduced,which enables the network to capture the temporal information and has the ability to process the historical information of the agents.Thirdly,an effective ex-ploration mechanism based on noise is proposed,which changes the exploration mode in DQN by introducing parameterized noise,improves the exploration efficiency of the agents,and makes the multi-agent system reach the exploration-utilization equilibrium state.It is tested on PyBullet simulation platform in six different simulation scenarios,and the results show that the proposed method can enable multi-agent teams to collaborate efficiently and reach their respective target points without collision,and the strategy training process is more stable.

Multi-agent systemMotion planningDeep reinforcement learningDQN

史殿习、彭滢璇、杨焕焕、欧阳倩滢、张玉晖、郝锋

展开 >

智能博弈与决策实验室 北京 100091

天津(滨海)人工智能创新中心 天津 300457

国防科技大学计算机学院 长沙 410073

多智能体系统 运动规划 深度强化学习 DQN方法

科技部科技创新2030-重大项目国家自然科学基金

2020AAA010480291948303

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(2)
  • 25