首页|基于改进DDPG算法的无人艇自适应控制

基于改进DDPG算法的无人艇自适应控制

扫码查看
[目的]针对水面无人艇(USV)在干扰条件下航行稳定性差的问题,提出一种基于深度强化学习(DRL)算法的智能参数整定方法,以实现对USV在干扰情况下的有效控制.[方法]首先,建立USV动力学模型,结合视线(LOS)法和PID控制器对USV进行航向控制;其次,引入DRL理论,设计智能体环境状态、动作和奖励函数在线调整PID参数;然后,针对深度确定性策略梯度(DDPG)算法收敛速度慢和训练时容易出现局部最优的情况,提出改进DDPG算法,将原经验池分离为成功经验池和失败经验池;最后,设计自适应批次采样函数,优化经验池回放结构.[结果]仿真实验表明,所改进的算法迅速收敛.同时,在训练后期条件下,基于改进DDPG算法控制器的横向误差和航向角偏差均显著减小,可更快地贴合期望路径后保持更稳定的路径跟踪.[结论]改进后的DDPG算法显著降低了训练时间成本,不仅增强了智能体训练后期的稳态性能,还提高了路径跟踪精度.
Adaptive control of unmanned surface vehicle based on improved DDPG algorithm
[Objective]In order to tackle the issue of the poor navigation stability of unmanned surface vehicles(USVs)under interference conditions,an intelligent control parameter adjustment strategy based on the deep reinforcement learning(DRL)method is proposed.[Method]A dynamic model of a USV combin-ing the line-of-sight(LOS)method and PID navigation controller is established to conduct its navigation con-trol tasks.In view of the time-varying characteristics of PID parameters for course control under interference conditions,the DRL theory is introduced.The environmental state,action and reward functions of the intelli-gent agent are designed to adjust the PID parameters online.An improved deep deterministic policy gradient(DDPG)algorithm is proposed to increase the convergence speed and address the issue of the occurrence of local optima during the training process.Specifically,the original experience pool is separated into success and failure experience pools,and an adaptive sampling mechanism is designed to optimize the experience pool playback structure.[Results]The simulation results show that the improved algorithm converges rapidly with a slightly improved average return in the later stages of training.Under interference conditions,the later-al errors and heading angle deviations of the controller based on the improved DDPG algorithm are reduced significantly.Path tracking can be maintained more steadily after fitting the desired path faster.[Conclusion]The improved algorithm greatly reduces the cost of training time,enhances the steady-state performance of the agent in the later stages of training and achieves more accurate path tracking.

USVdeep reinforcement learningintelligent controltrajectory trackingparameter setting

宋利飞、许传毅、郝乐、郭荣、柴威

展开 >

武汉理工大学 高性能船舶技术教育部重点实验室,湖北 武汉 430063

武汉理工大学 船海与能源动力工程学院,湖北 武汉 430063

无人艇 深度强化学习 智能控制 轨迹跟踪 参数整定

国家自然科学基金资助项目中央高校基本科研业务费专项资金资助项目

522013793120622898

2024

中国舰船研究
中国舰船研究设计中心

中国舰船研究

CSTPCD北大核心
影响因子:0.496
ISSN:1673-3185
年,卷(期):2024.19(1)
  • 17