基于双向长短时记忆与混合奖励函数的多无人车协同围捕控制

扫码查看

原文链接

万方数据

中文摘要：针对未知、不确定环境下多无人车协同围捕问题,本文提出了一种基于双向长短时记忆与混合奖励函数(BM-MADDPG)的多智能体协同围捕决策算法解决无人车围捕策略生成与协同控制问题.首先,通过双向长短时记忆(bidirectional long short-term memory,Bi-LSTM)网络捕捉状态和动作序列间时序信息特征,评估当前状态采取不同动作的长期效果,解决协作围捕信息数据利用率低的问题.针对无人车围捕任务场景中奖励稀疏、反馈延时而导致的学习效率低的问题,提出了一种稀疏奖励与密集奖励相结合的混合奖励函数(mixed reward function),引导围捕者探索,提高多无人车间的协作能力.仿真和实验表明,在多无人车协作围捕场景中,所提出的BM-MADDPG算法相较于MADDPG的围捕成功率提高了4.5％,有效提高多无人车协作围捕能力与学习训练效率.

外文标题：Multi unmanned vehicle cooperative encirclement control based on bidirectional long short-term memory and mixed reward functions

外文摘要：In the context of multi-unmanned vehicle cooperative encirclement within unknown and uncertain environments,this paper introduces a multi-agent cooperative encirclement decision-making algorithm,BM-MADDPG,based on Bidirectional Long Short-Term Memory(Bi-LSTM)and a Mixed Reward Function.The algorithm is designed to address the challenges of generating encirclement strategies and coordinating control for unmanned vehicles.To begin,the Bi-LSTM network is employed to capture the temporal information features between state and action sequences,enabling an assessment of the long-term effects of different actions in the current state.This addresses the issue of limited information utilization in collaborative encirclement.Furthermore,to overcome the problems associated with sparse rewards,feedback delays,and slow learning convergence in unmanned vehicle encirclement tasks,a Mixed Reward Function that combines sparse and dense rewards is proposed.This mixed reward function guides the encirclement agents to explore,accelerates training convergence,and enhances the collaborative capabilities among multiple unmanned vehicles.Simulations and experimental results reveal that in the context of multi-unmanned vehicle cooperative encirclement,the BM-MADDPG algorithm outperforms MADDPG,achieving a 4.5％increase in encirclement success rate.This effectively enhances the cooperative encirclement capabilities and learning efficiency of multiple unmanned vehicles.

外文关键词：

multiple unmanned vehiclesBM-MADDPGencirclement strategyBi-LSTMmixed reward function

作者：

顾健、王寅、苏牧青、孔小平、段可香、余萌

展开 >

作者单位：

南京航空航天大学航天学院,南京 211106

关键词：

多无人车 BM-MADDPG 围捕策略 Bi-LSTM 混合奖励函数

基金：

航空科学基金南京航空航天大学实验技术研究与开发课题

项目编号：

ASFC-20175152SYJS202311Z

出版年：

2024

DOI：

10.1360/SST-2023-0357

中国科学(技术科学)

中国科学院

中国科学(技术科学)

CSTPCD北大核心

影响因子：0.752

ISSN：1674-7259

年,卷(期)：2024.54(9)