首页|基于自博弈强化学习的异构无人机集群协同对抗决策方法

基于自博弈强化学习的异构无人机集群协同对抗决策方法

扫码查看
随着无人机技术的发展,无人机集群对抗已成为国内外研究热点。现有决策算法的研究主要集中于同构无人机集群对抗场景,且当应用于更复杂对抗场景时,存在奖励函数设计难度大、决策实时性难以满足等问题。为此,本文针对异构无人机集群对抗的实时机动决策问题展开研究。首先,构建了一个长机-僚机异构无人机集群的对抗仿真环境,其中,长机和僚机具有不同的机动和攻击能力,且对胜负具有不同影响力。其次,本文提出了一种基于多智能体强化学习的分布式无人机集群协同机动控制算法,并设计了一套结合课程学习和自博弈的策略训练与优化方法。通过设计简单的稀疏奖励结合课程学习方法即可学到异构无人机集群协同机动策略;引入自博弈对抗方式,使得对手无人机的策略更具针对性,以提升对抗的强度,从而进一步优化机动策略,使其更贴近实际需求。最后,仿真验证了本文所提方法的有效性和可扩展性。
Cooperative decision-making for heterogeneous UAV swarm confrontation based on self-play reinforcement learning
With the development of unmanned aerial vehicle(UAV)technology,UAV swarm confrontation has become a research hotspot at home and abroad.The existing decision-making algorithms mainly focus on the scenario of homogeneous UAV swarm confrontation.When facing complex adversarial environments,these methods encounter challenges,such as difficulty in designing reward functions and the inability to meet real-time decision-making requirements.To this end,this paper focuses on the real-time maneuver decision-making problem in heterogeneous UAV swarm combat.First,we construct an adversarial simulation environment for a leader-follower heterogeneous UAV swarm,where the leader and follower UAVs possess different maneuvering and attacking capabilities,and their outcomes have varying impacts on victory.Second,we propose a distributed UAV swarm cooperative maneuver control algorithm based on multi-agent reinforcement learning,and design a training and optimization approach combining curriculum learning and self-play.By designing simple sparse rewards combined with curriculum learning,we can get cooperative maneuver strategies for the heterogeneous UAV swarm.Introducing self-play adversarial mode makes opponents'UAV strategies more targeted,enhancing the intensity of combat and further optimizing maneuver strategies to better align with practical requirements.Last,the effectiveness and scalability of our proposed methods are validated through simulations.

swarm confrontationcooperative decision-makingself-playmulti-agent reinforcement learningUAV

严锐驰、李帅、王晨、吴琦、孙基男、张世琨、谢广明

展开 >

北京大学工学院智能仿生设计实验室,北京 100871

北京大学软件工程国家工程研究中心,北京 100871

中国航天科工集团第三研究院,北京 100074

北京大学人工智能研究院多智能体研究中心,北京 100871

展开 >

集群对抗 协同决策 自博弈 多智能体强化学习 无人机

国家自然科学基金国家自然科学基金国家自然科学基金

U22A20621227200861973007

2024

中国科学F辑
中国科学院,国家自然科学基金委员会

中国科学F辑

CSTPCD北大核心
影响因子:1.438
ISSN:1674-5973
年,卷(期):2024.54(7)