Path following control for under-actuated unmanned surface vehicles based on improved TD3
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
万方数据
维普
针对模型参数未知和海洋环境干扰下的欠驱动无人水面艇(unmanned surface vehicles,USV)路径跟踪问题,提出一种基于改进双延迟深度确 定性策略梯度(twin delayed deep deterministic policy gradient,TD3)的控制方法.在运动学层次上,设计基于视线制导的航速航向联合制导律,引导USV准确跟踪期望路径.在动力学层次上,设计基于改进TD3的强化学习动力学控制器;采用基于时间差分误差的优先经验回放技术,建立包含路径跟踪成功和失败采样信息的双经验池,通过自适应比例系数调整每批次回放数据的组成结构;搭建包含长短期记忆网络的评价网络和策略网络,利用历史状态序列信息提高路径跟踪控制器的训练效率.仿真结果表明,基于改进TD3的控制方法可有效提高欠驱动USV的跟踪精度.该方法不依赖USV模型,可为USV路径跟踪控制提供参考.
To investigate the path following issue of under-actuated unmanned surface vehicles(USVs)with model parameter uncertainties and marine environment disturbances,a control method based on the improved twin delayed deep deterministic policy gradient(TD3)is proposed.Within the kinematic level,a speed-heading joint guidance law based on the line-of-sight guidance is designed,which can guide USVs to follow the desired path accurately.Within the dynamic level,the reinforcement learning dynamics controller based on the improved TD3 is developed.By using the prioritized experience replay technology based on the temporal difference error,double experience pools which include the successful and failed sampling information of path following are constructed,and the adaptive proportion coefficient is used to adjust the structure of each batch replay data.The critic network and the actor network which include the long short-term memory network are developed,and the sequence information of historical states is utilized to enhance the training efficiency of the path following controller.The simulation results show that the control method based on the improved TD3 can effectively enhance the tracking accuracy of under-actuated USVs.The proposed method doesn't depend on the USV model and it can provide reference for path following control of USVs.
unmanned surface vehiclepath following controltwin delayed deep deterministic policy gradientprioritized experience replaylong short-term memory network