基于异步合作更新的LSTM-MADDPG多智能体协同决策算法

LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：针对完全合作型任务中,多智能体深度确定性策略梯度(MADDPG)算法存在信度分配以及训练稳定性差的问题,提出了一种基于异步合作更新的LSTM-MADDPG多智能体协同决策算法.基于差异奖励和值分解思想,利用长短时记忆(LSTM)网络提取轨迹序列间特征,优化全局奖励划分方法,实现各智能体的动作奖励分配;结合算法联合训练需求,构建高质量训练样本集,设计异步合作更新方法,实现LSTM-MADDPG网络的联合稳定训练.仿真结果表明,在协作捕获场景中,本文算法相较于QMIX的训练收敛速度提升了20.51%;所提异步合作更新方法相较于同步更新,归一化奖励值均方误差减小了57.59%,提高了算法收敛的稳定性.

外文摘要：In fully cooperative tasks,the MADDPG algorithm has credit assignment and poor stability of training problem.To address this problem,a LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update was proposed.According to the idea of Difference Reward and Value Decomposition,LSTM was used to extract the characteristics between trajectory sequences.The global reward division was optimized to realize the agent's reward distribution.In order to meet requirements of algorithm joint training,the high-quality training set was constructed.Then,the asynchronous cooperative update method was designed to joint train the LSTM-MADDPG network,and realize the cooperation of multi-agent.In cooperative capture scene,the simulation results show that the convergence speed of the proposed algorithm is increased by 20.51%compared with the QMIX.After the convergence of algorithm training,the update method of asynchronous cooperation reduces the mean square error of normalized reward value by 57.59%compared with synchronous update,which improves the stability of algorithm convergence.

外文关键词：

artificial intelligencemulti-agent coordination decision makingdeep reinforcement learningcredit assignmentupdate of asynchronous cooperation

作者：

高敬鹏、王国轩、高路

展开 >

作者单位：

哈尔滨工程大学信息与通信工程学院,哈尔滨 150001

北京航天长征飞行器研究所试验物理与计算数学国家级重点实验室,北京 100076

关键词：

人工智能多智能体协同决策深度强化学习信度分配异步合作更新

基金：

电子信息系统复杂电磁环境效应国家重点实验室项目

项目编号：

CEMEE2021G0001

出版年：

2024

DOI：

10.13229/j.cnki.jdxbgxb.20220523

吉林大学学报(工学版)

吉林大学

吉林大学学报(工学版)

CSTPCD北大核心

影响因子：0.792

ISSN：1671-5497

年,卷(期)：2024.54(3)

参考文献量17