利用深度强化学习方法对威胁区域环境下多无人机(UAV)自主路径规划问题进行研究.为了解决强化学习算法中普遍存在难以收敛的问题,提出 了一种改进的 Actor-Attention-Critic for Multi-Agent Reinforcement Learning(MAAC)算法用于多UAV的自主路径规划.通过建立多UAV势场环境模型定义强化学习的马尔科夫决策过程(Markov Modulated Process,MDP),在动态环境中规划出合理的无碰撞路径.仿真实验验证了所设计的多UAV自主路径规划控制算法的有效性,并通过对比仿真验证了该算法在收敛速度和避免碰撞方面具有更优越的性能.
Multi-UAV Autonomous Path Planning Based on Improved MAAC Algorithm
Deep reinforcement learning methods are used in multi-UAV autonomous path planning in threat area environments.In order to solve the common problem of difficult convergence in reinforcement learning algorithms,an improved Actor-Attention-Critic for Multi-Agent Reinforcement Learning(MAAC)algorithm is proposed for multi-UAV autonomous path planning.The Markov decision process of reinforcement learning is defined by modeling the multi-UAV potential field environment to provide a reasonable collision-free path planning in dynamic environment.Simulation experiments validate the effectiveness of the proposed algorithm,and verify its superior performance in terms of convergence speed and collision avoidance through comparative simulations.
UAVmulti-agent deep reinforcement learningautonomous path planningMAAC algorithm