基于改进Q学习算法的AUV路径规划

扫码查看

原文链接

万方数据
维普

中文摘要：针对欠驱动AUV全局路径规划问题,提出一种轻量级改进Q学习算法.设计距离奖励函数加快学习速率,提高算法稳定性,结合ε贪婪策略和Softmax策略提供一种平衡探索与利用的机制,根据AUV运动约束简化动作集合提高计算时间.仿真结果表明,改进的算法能够高效解决AUV路径规划问题,提升算法稳定性与适用范围.相比较传统Q学习算法,执行短距离任务时,算法学习效率提高90％,路径长度缩短7.85％,转向次数减少14.29％,执行长距离任务时,学习效率提高67.5％,路径长度缩短6.10％,转向次数减少32.14％.

外文标题：AUV path planning based on improved Q-learning algorithm

外文摘要：A lightweight improved Q-learning algorithm is proposed for the underactuated AUV global path planning problem.The distance reward function is designed to accelerate the learning rate and improve algorithm stability.The com-bination of epsilon-greedy strategy and Softmax strategy provides a mechanism to balance exploration and exploitation.The algorithm simplifies the action set based on AUV motion constraints to improve computational time.Simulation results demonstrate that the proposed algorithm efficiently solves the AUV path planning problem,enhancing algorithm stability and applicability.Compared to traditional Q-learning algorithms,when performing short-distance tasks,the learning effi-ciency is increased by 90％,the path length is reduced by 7.85％,and the number of turns is reduced by 14.29％.When per-forming long-distance tasks,the learning efficiency is improved by 67.5％,the path length is reduced by 6.10％,and the num-ber of turns is reduced by 32.14％.

外文关键词：

autonomous underwater vehiclepath planningQ-learningSoftmax policydistance rewardmech-anism

作者：

黄昱舟、胡庆玉、熊华乔

展开 >

作者单位：

中国船舶集团有限公司第七一〇研究所,湖北宜昌 443000

关键词：

自主水下航行器路径规划 Q学习 Softmax策略距离奖惩机制

出版年：

2024

DOI：

10.3404/j.issn.1672-7649.2024.24.016

舰船科学技术

中国舰船研究院,中国船舶信息中心

舰船科学技术

CSTPCD北大核心

影响因子：0.373

ISSN：1672-7649

年,卷(期)：2024.46(24)