首页|基于CQL-SAC的自动驾驶防撞决策方法

基于CQL-SAC的自动驾驶防撞决策方法

扫码查看
针对深度强化学习在自动驾驶任务中存在价值函数过估计、学习效率低、安全性差等问题,提出了一种自动驾驶防撞决策方法。首先,将保守Q学习(conservative Q-learning,CQL)算法与软行动评论(softactor-critic,SAC)算法融合,提出CQL-SAC算法,以缓解价值过估计问题。然后,在算法训练过程中引入专家经验,实现算法快速收敛,以解决学习效率低的问题。最后,利用防撞模块对CQL-SAC算法输出的动作进行安全检查和矫正,避免车辆碰撞。在基于高速公路的仿真场景下对方法有效性进行验证。仿真结果表明,在训练阶段,CQL-SAC算法相比SAC算法和样本内行动评论(in-sample actor-critic,InAC)算法收敛速度分别提升12。5%、5。4%,引入专家经验后算法收敛速度进一步提升14。3%;在测试阶段,本文算法与SAC和InAC算法相比,成功率分别提升17、12百分点,平均回合奖励分别提升23。1%、10。7%。
Collision prevention decision-making method for autonomous driving based on CQL-SAC algorithm
In response to the problems of value function overestimation,low learning efficiency,and poor safety in deep reinforcement learning for autonomous driving tasks,a collision avoidance decision-making method was proposed.Firstly,by integrating conservative Q-learning(CQL)algorithm with soft actor-critic(SAC)algorithm,the CQL-SAC algorithm was proposed to alleviate the problem of value overestimation.Then,expert experience was introduced during the algorithm training process to achieve fast convergence and solve the problem of low learning efficiency.Finally,the collision prevention module was used to perform safety checks and corrections on the actions output by the CQL-SAC algorithm,in order to avoid vehicle collisions.The effectiveness of this scheme was verified in a simulation scenario based on highways.The simulation results show that during the training phase,the CQL-SAC algorithm improves the convergence speed by 12.5%and 5.4%compared with the SAC algorithm and in-sample actor-critic(InAC)algorithm,respectively;and the algorithm convergence speed is further improved by 14.3%after introducing expert experience.During the testing phase,the proposed scheme shows better performance with a success rate increase of 17 and 12 percentage points and an average turn reward increase of 23.1%and 10.7%compared with the SAC and InAC algorithms,respectively.

smart transportationautonomous driving decision-makingconservative Q-learning(CQL)algorithmsoft actor-critic(SAC)algorithmexpert experiencecollision prevention strategy

刘玉辉、于镝

展开 >

北京信息科技大学自动化学院,北京 100192

智慧交通 自动驾驶决策 保守Q学习算法 软行动评论算法 专家经验 防撞策略

2024

北京信息科技大学学报(自然科学版)
北京信息科技大学

北京信息科技大学学报(自然科学版)

影响因子:0.363
ISSN:1674-6864
年,卷(期):2024.39(3)