基于CQL-SAC的自动驾驶防撞决策方法

扫码查看

原文链接

万方数据
维普

中文摘要：针对深度强化学习在自动驾驶任务中存在价值函数过估计、学习效率低、安全性差等问题，提出了一种自动驾驶防撞决策方法。首先，将保守Q学习(conservative Q-learning，CQL)算法与软行动评论(softactor-critic，SAC)算法融合，提出CQL-SAC算法，以缓解价值过估计问题。然后，在算法训练过程中引入专家经验，实现算法快速收敛，以解决学习效率低的问题。最后，利用防撞模块对CQL-SAC算法输出的动作进行安全检查和矫正，避免车辆碰撞。在基于高速公路的仿真场景下对方法有效性进行验证。仿真结果表明，在训练阶段，CQL-SAC算法相比SAC算法和样本内行动评论(in-sample actor-critic，InAC)算法收敛速度分别提升12。5％、5。4％，引入专家经验后算法收敛速度进一步提升14。3％;在测试阶段，本文算法与SAC和InAC算法相比，成功率分别提升17、12百分点，平均回合奖励分别提升23。1％、10。7％。

外文标题：Collision prevention decision-making method for autonomous driving based on CQL-SAC algorithm

外文摘要：In response to the problems of value function overestimation,low learning efficiency,and poor safety in deep reinforcement learning for autonomous driving tasks,a collision avoidance decision-making method was proposed.Firstly,by integrating conservative Q-learning(CQL)algorithm with soft actor-critic(SAC)algorithm,the CQL-SAC algorithm was proposed to alleviate the problem of value overestimation.Then,expert experience was introduced during the algorithm training process to achieve fast convergence and solve the problem of low learning efficiency.Finally,the collision prevention module was used to perform safety checks and corrections on the actions output by the CQL-SAC algorithm,in order to avoid vehicle collisions.The effectiveness of this scheme was verified in a simulation scenario based on highways.The simulation results show that during the training phase,the CQL-SAC algorithm improves the convergence speed by 12.5％and 5.4％compared with the SAC algorithm and in-sample actor-critic(InAC)algorithm,respectively;and the algorithm convergence speed is further improved by 14.3％after introducing expert experience.During the testing phase,the proposed scheme shows better performance with a success rate increase of 17 and 12 percentage points and an average turn reward increase of 23.1％and 10.7％compared with the SAC and InAC algorithms,respectively.

外文关键词：

smart transportationautonomous driving decision-makingconservative Q-learning(CQL)algorithmsoft actor-critic(SAC)algorithmexpert experiencecollision prevention strategy

作者：

刘玉辉、于镝

展开 >

作者单位：

北京信息科技大学自动化学院,北京 100192

关键词：

智慧交通自动驾驶决策保守Q学习算法软行动评论算法专家经验防撞策略

出版年：

2024

DOI：

10.16508/j.cnki.11-5866/n.2024.03.003

北京信息科技大学学报(自然科学版)

北京信息科技大学

北京信息科技大学学报(自然科学版)

影响因子：0.363

ISSN：1674-6864

年,卷(期)：2024.39(3)