首页|基于元强化学习的自动驾驶算法研究

基于元强化学习的自动驾驶算法研究

扫码查看
随着深度学习和强化学习的发展,基于深度强化学习的端到端自动驾驶模型的研究已经成为热门研究课题.针对基于深度强化学习的自动驾驶模型"学会学习"能力较差,面临新的驾驶任务时需从零开始训练、训练速度缓慢、泛化性能差等问题,提出了一种基于元强化学习的MPPO(Meta-PPO)自动驾驶模型.MPPO模型将元学习与强化学习相结合,利用元学习算法在元训练阶段为 自动驾驶模型训练一组良好的参数,使模型在面临新的驾驶任务时能够在该组参数的基础上,经过少量样本微调就可以快速达到收敛状态.实验结果表明,在导航场景任务中,与基于强化学习的基准自动驾驶模型相比,MPPO模型的收敛速度提高了 2.52倍,奖励值提高了 7.50%,偏移量减少了 7.27%,泛化性能也得到了一定程度的提高,能够应用于多任务场景中.
Research on Autonomous Driving Algorithm Based on Meta-Reinforcement Learning
Aiming at the problems such as poor"learning to learn"ability of the autonomous driving model based on deep reinforcement learning,start training from scratch when facing new driving tasks,slow training speed,poor generaliza-tion performance and so on,this paper proposes a MPPO(Meta-PPO)autonomous driving model based on meta-rein-forcement learning.The MPPO model combines the meta-learning with the reinforcement learning,and uses the meta-learn-ing algorithm to train a set of good parameters for the autonomous driving model in the meta-training stage,so that the model can quickly reach the convergence state after a small amount of sample fine-tuning on the basis of this set of pa-rameters when facing new driving tasks.The experimental results show that,in the navigation scenario task,compared with the benchmark autonomous driving model based on reinforcement learning,the convergence speed of MPPO model in-creases 2.52 times,the reward value increases 7.50%,the offset reduces 7.27%,and the generalization performance also improves to a certain extent.

autonomous drivingmeta-learningreinforcement learningproximal policy optimization

金彦亮、范宝荣、高塬

展开 >

上海大学通信与信息工程学院,上海 200444

自动驾驶 元学习 强化学习 近端策略优化

国家自然科学基金

22ZR1422200

2024

工业控制计算机
中国计算机学会工业控制计算机专业委员会 江苏省计算技术研究所有限责任公司

工业控制计算机

影响因子:0.258
ISSN:1001-182X
年,卷(期):2024.37(3)
  • 7