基于改进TD3的欠驱动无人水面艇路径跟踪控制

Path following control for under-actuated unmanned surface vehicles based on improved TD3

曲星儒 ¹江雨泽 ¹李初 ¹龙飞飞 ¹张汝波¹

扫码查看

作者信息

1. 大连民族大学机电工程学院,辽宁大连 116600
折叠

摘要

针对模型参数未知和海洋环境干扰下的欠驱动无人水面艇(unmanned surface vehicles,USV)路径跟踪问题,提出一种基于改进双延迟深度确定性策略梯度(twin delayed deep deterministic policy gradient,TD3)的控制方法.在运动学层次上,设计基于视线制导的航速航向联合制导律,引导USV准确跟踪期望路径.在动力学层次上,设计基于改进TD3的强化学习动力学控制器;采用基于时间差分误差的优先经验回放技术,建立包含路径跟踪成功和失败采样信息的双经验池,通过自适应比例系数调整每批次回放数据的组成结构;搭建包含长短期记忆网络的评价网络和策略网络,利用历史状态序列信息提高路径跟踪控制器的训练效率.仿真结果表明,基于改进TD3的控制方法可有效提高欠驱动USV的跟踪精度.该方法不依赖USV模型,可为USV路径跟踪控制提供参考.

Abstract

To investigate the path following issue of under-actuated unmanned surface vehicles(USVs)with model parameter uncertainties and marine environment disturbances,a control method based on the improved twin delayed deep deterministic policy gradient(TD3)is proposed.Within the kinematic level,a speed-heading joint guidance law based on the line-of-sight guidance is designed,which can guide USVs to follow the desired path accurately.Within the dynamic level,the reinforcement learning dynamics controller based on the improved TD3 is developed.By using the prioritized experience replay technology based on the temporal difference error,double experience pools which include the successful and failed sampling information of path following are constructed,and the adaptive proportion coefficient is used to adjust the structure of each batch replay data.The critic network and the actor network which include the long short-term memory network are developed,and the sequence information of historical states is utilized to enhance the training efficiency of the path following controller.The simulation results show that the control method based on the improved TD3 can effectively enhance the tracking accuracy of under-actuated USVs.The proposed method doesn't depend on the USV model and it can provide reference for path following control of USVs.

关键词

无人水面艇/路径跟踪控制/双延迟深度确定性策略梯度/优先经验回放/长短期记忆网络

Key words

unmanned surface vehicle/path following control/twin delayed deep deterministic policy gradient/prioritized experience replay/long short-term memory network

引用本文复制引用

基金项目

国家自然科学基金(61673084)

中央高校基本科研业务费(04442024046)

出版年

2024

上海海事大学学报

上海海事大学

上海海事大学学报

CSTPCD北大核心

影响因子：0.578

ISSN：1672-9498

参考文献量11

段落导航