首页|基于元学习和强化学习的自动驾驶算法

基于元学习和强化学习的自动驾驶算法

扫码查看
针对基于强化学习的自动驾驶算法存在收敛困难、训练效果不理想、泛化性能差等问题,提出了一种基于元学习和强化学习的自动驾驶系统.该系统首先将变分自编码器(variational auto-encoder,VAE)与具有梯度惩罚的Wasserstein生成对抗网络(Wasserstein generative adversarial network with gradient penalty,WGAN-GP)相结合形成VWG(VAE-WGAN-GP)模型,提高了所提取特征的质量;然后用元学习算法Reptile训练VWG特征提取模型,进一步得到MVWG(meta-VWG)特征提取模型,以提高模型的训练速度;最后将特征提取模型与近端策略优化(proximal policy optimization,PPO)决策算法相结合,对PPO算法中的奖励函数进行改进,提高了决策模型的收敛速度,最终得到MVWG-PPO自动驾驶模型.实验结果表明,该文提出的MVWG特征提取模型与VAE、VW(VAE-WGAN)、VWG基准模型相比,重构损失分别降低了 60.82%、44.73%和29.09%,收敛速度均提高约5.00倍,重构图像更加清晰,并且在自动驾驶任务中的表现也更好,能够为智能车提供更高质量的特征信息.同时,改进奖励函数后的决策模型与基准决策模型相比,收敛速度也提高了 11.33%,充分证明了该文方法的先进性.
Autonomous Driving Algorithm Based on Meta-Learning and Reinforcement Learning
To address the problems of convergence difficulty,unsatisfactory training effect and poor generalization performance of autonomous driving algorithms based on reinforce-ment learning,an autonomous driving system based on meta-learning and reinforcement learning is proposed in this paper.The system first combines variational auto encoder(VAE)with Wasserstein generative adversarial network incorporating gradient penalty(WGAN-GP)to form the VWG(VAE-WGAN-GP)model,which improves the quality of extracted feature.Then,the meta learning algorithm Reptile is used to train the VWG feature extraction model,yielding the MVWG(Meta-VWG)feature extraction model.This approach accelerates the training speed.Finally,the feature extraction model is combined with the proximal policy optimization(PPO)decision algorithm,and the reward function in the PPO algorithm is refined to enhance the convergence speed of the decision model,resulting in the MVWG-PPO autonomous driving model.Experimental results show that compared with VAE,VW(VAE-WGAN)and VWG benchmark models,the MVWG fea-ture extraction model proposed in this paper reduces reconstruction loss by 60.82%,44.73%,and 29.09%,respectively.The convergence rate increases approximately fivefold,achiev-ing clearer reconstructed images and superior performance in automatic driving tasks.It can provide higher-quality feature information for autonomous vehicles.Meanwhile,com-pared with the benchmark decision model,the improved reward function model exhibits an 11.33%increase in convergence rate,which fully demonstrating the superiority of the proposed method.

autonomous drivingfeature extractionreinforcement learningmeta-learning

金彦亮、范宝荣、高塬、汪小勇、顾晨杰

展开 >

上海大学通信与信息工程学院,上海 200444

上海大学上海先进通信与数据科学研究院,上海 200444

卡斯柯信号有限公司,上海 200070

上海轨道交通无人驾驶列控系统工程技术研究中心,上海 200434

展开 >

自动驾驶 特征提取 强化学习 元学习

国家自然科学基金

22ZR1422200

2024

应用科学学报
上海大学 中国科学院上海技术物理研究所

应用科学学报

CSTPCD北大核心
影响因子:0.594
ISSN:0255-8297
年,卷(期):2024.42(5)