信控路段混行交通生态驾驶深度强化学习模型

Eco-driving Under Mixed Autonomy at Signalized Intersection:A Deep Reinforcement Learning Model

辛琪 ¹王嘉琪 ¹杨文科 ¹徐猛 ²袁伟¹

扫码查看

作者信息

1. 长安大学,汽车学院,西安 710064
2. 北京交通大学,系统科学学院,北京 100044
折叠

摘要

针对考虑通过性约束和安全性约束的动态规划模型,其在混行和大流量条件下模型复杂度较高,甚至会出现无解的问题,本文提出一种混行信控路段智能网联车辆生态驾驶轨迹优化的深度强化学习模型.本文所提模型通过设定不同程度的奖惩机制,并采用双延迟深度确定性策略梯度算法优化混行车流中智能网联车辆接近信号交叉口的轨迹.首先,选取车距、速度差、速度、到交叉口距离、排队长度、信号相位及配时等特征作为智能体状态,刻画驾驶安全性和通行效率,特别地,将交叉口排队长度扩增到状态中,解决智能网联车辆因有人驾驶车辆排队而临时停车的问题;其次,构建基于智能体状态和预期到达交叉口时间的多目标奖励函数,同时,优化混行车流下智能网联车辆的效率、能耗、舒适性和安全性,解决动态规划模型约束与求解复杂度关联的问题.仿真训练和测试结果表明,随着智能网联车辆渗透率的提高,车辆在交叉口等待时间显著减少;与无控制相比,能耗降低约5.47%;与动态规划模型相比,能耗降低约4.42%,与基于深度确定性策略梯度轨迹规划模型相比,能耗降低约2.91%.此外,在交通需求和信号周期波动条件下,本文所提模型均可实现智能网联车辆不停车通过信号交叉口.

Abstract

Dynamic programming model with eco-through constraint and safety constraint often causes computational inefficiency and even unfeasible solutions in mixed autonomy and heavy traffic conditions.This paper proposes an eco-driving-oriented and deep reinforcement learning based trajectory optimization model for Connected and Autonomous Vehicles(CAVs)in mixed autonomy.The model uses a compound reward reshaping and a twin delayed deep deterministic policy gradient algorithm to optimize CAV trajectories at the upstream of signalized intersection in mixed autonomy.The vehicular gap,speed difference,speed,distance to intersection,queue length,signal phasing and timing are selected as agent state to describe safety and driving mobility.The queue length is augmented in state representation to mitigate CAV halting possibility caused by queue of human driving vehicles.A multi-objective reward function is established based on agent state and anticipated arrival time at the intersection to optimize the CAV driving mobility,energy efficiency,comfortability,and safety.The proposed model performs better than the dynamic programing model in terms of decoupling the strong correlation between model constraints and computational complexity.The training and testing of the proposed model with simulation demonstrate that the vehicle delay at intersections significantly decreases with the increase of CAV penetration rate.Besides,the energy consumption relatively decreases by 5.47%,4.42%,and 2.91%,compared to uncontrolled scenarios,dynamic programming-based trajectory optimization model,and deep deterministic policy gradient-based trajectory optimization model.In addition,the proposed model can ensure the CAV to cross the signalized intersection without stopping,and also show robustness against traffic demand and signal cycle.

关键词

智能交通/轨迹优化/双延迟深度确定性策略梯度/信号交叉口/智能网联车辆

Key words

intelligent transportation/trajectory optimization/twin delayed deep deterministic policy gradient/signalized intersection/connected and autonomous vehicle

引用本文复制引用

基金项目

国家自然科学基金(52002035)

中央高校基本科研业务费专项长安大学项目(300102223501)

中央高校基本科研业务费专项长安大学项目(300102223205)

出版年

2024

交通运输系统工程与信息

中国系统工程学会

交通运输系统工程与信息

CSTPCDCSCD北大核心

影响因子：0.664

ISSN：1009-6744

参考文献量5

段落导航