Curriculum Learning-based Simulation of UAV Air Combat Under Sparse Rewards
To address the limited exploration capabilities and sparse rewards of conventional reinforcement learning methods in air combat environment,a curriculum learning distributed proximal policy optimization(CLDPPO)reinforcement learning algorithm is proposed.A reward function informed by professional empirical knowledge is integrated,a discrete action space is developed,and a global observation and local value and decision network featuring separated global and local observations is established.A methodology for unmanned aerial vehicles UAVs is presented to acquire combat expertise through a sequence of fundamental courses that progressively intensify in their offensive,defensive,and comprehensive content.The experimental results show that the methodology surpasses the specialist system and the other mainstream reinforcement learning algorithms,which has the ability of the autonomous acquisition of air warfare tactics and can enhance the sparse rewards.