稀疏奖励下基于课程学习的无人机空战仿真

Curriculum Learning-based Simulation of UAV Air Combat Under Sparse Rewards

祝靖宇 ¹张宏立 ¹匡敏驰 ²史恒 ²朱纪洪 ²乔直 ²周文卿³

扫码查看

作者信息

1. 新疆大学电气工程学院,新疆乌鲁木齐 830000
2. 清华大学精密仪器系,北京 100084
3. 清华大学计算机科学技术系,北京 100084
折叠

摘要

针对传统强化学习在空战环境下探索能力差和奖励稀疏的问题,提出了一种基于课程学习的分布式近端策略优化(curriculum learning distributed proximal policy optimization,CLDPPO)强化学习算法.嵌入包含专家经验知识的奖励函数,设计了离散化的动作空间,构建了局部观测与全局观测分离的演员评论家网络.通过为无人机制定进攻、防御以及综合课程,让无人机从基本课程由浅入深开始学习作战技能,阶段性提升无人机作战能力.实验结果表明:以课程学习方式训练的无人机能以一定的优势击败专家系统和主流强化学习算法,同时具有空战战术的自我学习能力,有效改善稀疏奖励的问题.

Abstract

To address the limited exploration capabilities and sparse rewards of conventional reinforcement learning methods in air combat environment,a curriculum learning distributed proximal policy optimization(CLDPPO)reinforcement learning algorithm is proposed.A reward function informed by professional empirical knowledge is integrated,a discrete action space is developed,and a global observation and local value and decision network featuring separated global and local observations is established.A methodology for unmanned aerial vehicles UAVs is presented to acquire combat expertise through a sequence of fundamental courses that progressively intensify in their offensive,defensive,and comprehensive content.The experimental results show that the methodology surpasses the specialist system and the other mainstream reinforcement learning algorithms,which has the ability of the autonomous acquisition of air warfare tactics and can enhance the sparse rewards.

关键词

UAVs/空战/稀疏奖励/课程学习/分布式近端策略优化

Key words

UAVs/air combat/sparse reward/curriculum learning/distributed proximal policy optimization(DPPO)

引用本文复制引用

出版年

2024

系统仿真学报

北京仿真中心中国系统仿真学会

系统仿真学报

CSTPCD北大核心

影响因子：0.551

ISSN：1004-731X

被引量1

参考文献量7

段落导航