中国公路学报2024,Vol.37Issue(3) :157-169.DOI:10.19721/j.cnki.1001-7372.2024.03.007

基于强化学习的智能车辆路径跟踪变参数MPC多目标控制

Variable-parameter MPC Multi-objective Control for Intelligent Vehicle Path Tracking Based on Reinforcement Learning

汪洪波 王春阳 赵林峰 胡延平
中国公路学报2024,Vol.37Issue(3) :157-169.DOI:10.19721/j.cnki.1001-7372.2024.03.007

基于强化学习的智能车辆路径跟踪变参数MPC多目标控制

Variable-parameter MPC Multi-objective Control for Intelligent Vehicle Path Tracking Based on Reinforcement Learning

汪洪波 1王春阳 1赵林峰 1胡延平1
扫码查看

作者信息

  • 1. 合肥工业大学汽车与交通工程学院,安徽合肥 230009
  • 折叠

摘要

为了解决智能车辆在工况变化时跟踪精度下降和稳定性变差的问题,提出基于强化学习的变参数模型预测控制(MPC)算法多 目标控制策略,实现智能车辆路径跟踪控制系统的参数自适应整定.基于车辆动力学模型设计其线性时变MPC控制器,获得最优前轮转向角和附加横摆力矩.基于Actor-Critic强化学习架构,设计进行控制参数整定的深度确定性策略梯度(DDPG)智能体和双延迟深度确定性策略梯度(TD3)智能体,构造以跟踪精度和稳定性为 目标的收益函数,并搭建对接工况和变曲率工况2种典型仿真场景进行算法性能验证,当车辆处于对接工况时,根据路面附着系数的变化及时调整控制器的预测时域和权重矩阵;当车辆处于变曲率工况下时,针对道路曲率变化及时调整控制器的预测时域和权重矩阵.通过MATLAB/SimuLink、CarSim和Python联合仿真分析,将强化学习方法参数整定MPC与固定参数MPC和模糊控制方法参数整定MPC进行对比,结果表明:强化学习方法更能够在保证车辆安全性的前提下,尽可能提高智能车辆在不同路面条件下的路径跟踪精度.在对接工况下,强化学习方法参数整定MPC相较于固定参数MPC和模糊控制方法参数整定MPC,横向偏差平均值分别减少了 99.8%和97.6%,前轮转角变化率平均值分别减小了 99.7%和77.0%;变曲率工况下,横向偏差平均值分别减少了 79.6%和90.8%,前轮转角变化率平均值分别减小了 40.6%和2.6%.说明所提出的基于强化学习的智能车辆径跟踪变参数MPC多 目标控制能够解决变工况下的路径跟踪的稳定性和跟踪精度控制问题,为复杂场景下的路径跟踪控制提供了一种思路.

Abstract

To address the problems of tracking accuracy degradation and stability deterioration when operating intelligent vehicles under changing driving conditions,a multi-objective control strategy based on reinforcement learning variable parameter model predictive control(MPC)algorithm was proposed in this study.The proposed method effectively realizes the parameter adaptive tuning of intelligent vehicle path tracking control system.The proposed linear time-varying MPC controller was designed based on a vehicle dynamics model to obtain the optimal front-wheel steering angle and additional yaw moment.Based on the Actor-Critic reinforcement learning architecture,the Deep Deterministic Policy Gradient(DDPG)and Twin Delayed Deep Deterministic Policy Gradient(TD3)agents were designed for control parameter tuning.The gain function was constructed with tracking accuracy and system stability as the goal,and two typical simulation scenarios of docking road and variable curvature road were constructed for the algorithm performance verification.For the docking road scenario,the prediction horizon and weight matrix of the controller were adjusted in time according to the changes in the road adhesion coefficient.Whereas for the variable curvature road scenario,the prediction horizon and weight matrix of the controller were adjusted in time according to the changes in the road curvature.Through joint simulation analyses conducted using MATLAB/SimuLink,CarSim,and Python,the reinforcement learning-tuned MPC was compared with fixed parameter MPC and Fuzzy-tuned MPC models.The results showed that the reinforcement learning methods yielded the best performance regarding the path tracking accuracy of intelligent vehicles under different road conditions,while guaranteeing the vehicle safety as much as possible.Under the docking road condition,compared with the fixed parameter MPC and Fuzzy-tuned MPC models,the average lateral deviation of the vehicle was reduced by 99.8%and 97.6%,respectively,when using the reinforcement learning-tuned MPC,and the average front-wheel angle change rate was reduced by 99.7%and 77.0%,respectively.Moreover,under the variable curvature road condition,the average lateral deviation was decreased by 79.6%and 90.8%,respectively,and the average front-wheel angle change rate decreased by 40.6%and 2.6%,respectively,compared with those obtained when using the fixed parameter MPC and Fuzzy-tuned MPC models.

关键词

汽车工程/路径跟踪/模型预测控制/强化学习/控制参数整定/附加横摆力矩控制

Key words

automotive engineering/path tracking/model predictive control/reinforcement learn-ing/control parameter tuning/additional yaw moment control

引用本文复制引用

基金项目

国家自然科学基金(U22A20246)

国家自然科学基金(52372382)

合肥市自然科学基金(2022008)

出版年

2024
中国公路学报
中国公路学会

中国公路学报

CSTPCDCSCD北大核心
影响因子:1.607
ISSN:1001-7372
参考文献量26
段落导航相关论文