基于近端策略优化算法的端到端车道保持算法研究

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据
维普

中文摘要：为提高车道保持算法的成功率,增强无人车导航能力,提出了一种基于改进的近端策略优化算法(Proxi-mal Policy Optimization,PPO)的端到端车道保持算法研究.通过将PPO算法中的一个隐藏层替换为LSTM网络及重新设计奖励函数创建端到端的车道保持算法框架,该框架可以将用于训练的算法策略与模拟器相结合,框架以车前方摄像头的RGB图像、深度图像、无人车的速度、偏离车道线值与碰撞系数等无人车周围环境变量为输入,以车前方摄像头的油门、刹车、方向盘转角等无人车周围环境变量为输出.在Airsim仿真平台下不同的地图中进行训练与测试,并与原算法进行对比实验.实验结果证明改进的LSTM-PPO算法能够训练出有效的车道保持算法,改进后的算法能显著减少训练时间并增加算法的鲁棒性.

外文标题：An end-to-end lane keeping algorithm based on the Proximal Policy Optimization algorithm

外文摘要：To improve the success rate of unmanned driving and enhance the navigation ability of unmanned vehicles,this paper proposes an end-to-end lane keeping algorithm based on an improved Proximal Policy Optimization(PPO)algorithm.This article cre-ates an end-to-end unmanned driving framework by replacing a hidden layer in the PPO algorithm with an LSTM network and rede-signing a reward function.The framework can combine algorithm strategies for training with simulators.The framework takes RGB im-ages,depth images,unmanned vehicle speed,lane departure values,and collision coefficients of the camera in front of the vehicle as in-puts,and takes throttle,brake The environment variables around unmanned vehicles such as steering wheel angle are outputs.Train and test on different maps on the Airsim simulation platform,and conduct comparative experiments with the original algorithm.The ex-perimental results demonstrate that the improved LSTM-PPO algorithm can train effective autonomous driving algorithms,and the im-proved algorithm can significantly reduce training time and increase the robustness of the algorithm.

外文关键词：

Autonomous drivingReinforcement learningNear end strategy optimizationLong and short term memory network

作者：

宋建辉、崔永阔

展开 >

作者单位：

沈阳理工大学,沈阳 110159

关键词：

自动驾驶强化学习近端策略优化长短期记忆网络

基金：

辽宁省教育厅高等学校基本科研项目沈阳市中青年科技创新人才支持计划

项目编号：

LJKZ0275RC210247

出版年：

2024

通信与信息技术

四川省通信学会

通信与信息技术

影响因子：0.223

ISSN：1672-0164

年,卷(期)：2024.(3)

参考文献量13