首页|基于异步合作更新的LSTM-MADDPG多智能体协同决策算法

基于异步合作更新的LSTM-MADDPG多智能体协同决策算法

扫码查看
针对完全合作型任务中,多智能体深度确定性策略梯度(MADDPG)算法存在信度分配以及训练稳定性差的问题,提出了一种基于异步合作更新的LSTM-MADDPG多智能体协同决策算法.基于差异奖励和值分解思想,利用长短时记忆(LSTM)网络提取轨迹序列间特征,优化全局奖励划分方法,实现各智能体的动作奖励分配;结合算法联合训练需求,构建高质量训练样本集,设计异步合作更新方法,实现LSTM-MADDPG网络的联合稳定训练.仿真结果表明,在协作捕获场景中,本文算法相较于QMIX的训练收敛速度提升了20.51%;所提异步合作更新方法相较于同步更新,归一化奖励值均方误差减小了57.59%,提高了算法收敛的稳定性.
LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update
In fully cooperative tasks,the MADDPG algorithm has credit assignment and poor stability of training problem.To address this problem,a LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update was proposed.According to the idea of Difference Reward and Value Decomposition,LSTM was used to extract the characteristics between trajectory sequences.The global reward division was optimized to realize the agent's reward distribution.In order to meet requirements of algorithm joint training,the high-quality training set was constructed.Then,the asynchronous cooperative update method was designed to joint train the LSTM-MADDPG network,and realize the cooperation of multi-agent.In cooperative capture scene,the simulation results show that the convergence speed of the proposed algorithm is increased by 20.51%compared with the QMIX.After the convergence of algorithm training,the update method of asynchronous cooperation reduces the mean square error of normalized reward value by 57.59%compared with synchronous update,which improves the stability of algorithm convergence.

artificial intelligencemulti-agent coordination decision makingdeep reinforcement learningcredit assignmentupdate of asynchronous cooperation

高敬鹏、王国轩、高路

展开 >

哈尔滨工程大学 信息与通信工程学院,哈尔滨 150001

北京航天长征飞行器研究所 试验物理与计算数学国家级重点实验室,北京 100076

人工智能 多智能体协同决策 深度强化学习 信度分配 异步合作更新

电子信息系统复杂电磁环境效应国家重点实验室项目

CEMEE2021G0001

2024

吉林大学学报(工学版)
吉林大学

吉林大学学报(工学版)

CSTPCD北大核心
影响因子:0.792
ISSN:1671-5497
年,卷(期):2024.54(3)
  • 17