基于持续强化学习的自动驾驶赛车决策算法研究

Decision making based on continual reinforcement learning for autonomous racing

牛京玉 ¹胡瑜 ¹李玮 ²韩银和¹

扫码查看

作者信息

1. 中国科学院计算技术研究所智能计算机研究中心北京 100190;中国科学院大学北京 100049
2. 中国科学院计算技术研究所智能计算机研究中心北京 100190
折叠

摘要

赛道形状与路面材质变化对自动驾驶赛车的行为决策带来了严峻挑战.为应对道路间的动力学差异,本文提出一种基于持续强化学习(CRL)的高速赛车决策算法.该算法将不同道路看作独立任务.算法的第1 训练阶段负责提取描述不同任务上赛车动力学的低维特征,从而计算出任务间的相似性关系.算法的第2 训练阶段负责为策略学习过程提供2 个持续强化学习约束:其一是权重正则化约束,策略网络中对于旧任务重要的权重将在新任务学习期间被限制更新,其限制力度由任务相似性自适应调节;其二是奖励函数约束,鼓励在新任务学习期间策略的旧任务性能不下降.设计不同任务排序下的赛车实验和持续强化学习评价指标以评估算法性能.实验结果表明,所提算法能在既不存储旧任务数据也不扩展策略网络的条件下获得比基准方法更出色的驾驶性能.

Abstract

The variety of road shapes and materials presents a serious decision-making challenge for high-speed autono-mous racing.To address the issue of dynamics gap between various roads,a decision-making algorithm based on continual reinforcement learning(CRL)is proposed.These roads are considered as different tasks.The first train-ing stage of the algorithm extracts low-dimension task features that can characterize the vehicle dynamics on differ-ent roads.These features are used to compute the task similarity.The second training stage of the algorithm pro-vides two CRL constraints for policy learning.One is the weight regularization constraint,which restricts the up-dates of policy weights that are important for old tasks.This restriction is adaptively regulated by task similarity.The other is the reward constraint,which encourages no performance degradation on old tasks while the policy is learning a new task.Racing experiments with different task sequences and CRL metrics are set to evaluate the algo-rithm.The results show that the proposed algorithm outperforms baselines without storing old tasks'data or expan-ding policy network size.

关键词

强化学习(RL)/持续学习/行为决策/自动驾驶赛车/动力学特征提取

Key words

reinforcement learning(RL)/continual learning/decision making/autonomous racing/dynam-ics feature extraction

引用本文复制引用

基金项目

国家自然科学基金(62176250)

国家自然科学基金(62003323)

中国科学院计算技术研究所计算机体系结构国家重点实验室创新项目(CARCH5203)

中国科学院计算技术研究所计算机体系结构国家重点实验室创新项目(CARCH5406)

出版年

2024

高技术通讯

中国科学技术信息研究所

高技术通讯

CSTPCD北大核心

影响因子：0.19

ISSN：1002-0470

参考文献量34

段落导航