首页|基于自注意力机制的深度强化学习交通信号控制

基于自注意力机制的深度强化学习交通信号控制

扫码查看
交通信号控制(Traffic Signal Control,TSC)仍然是交通领域中最重要的研究课题之一.针对现有基于深度强化学习(Deep Reinforcement Learning,DRL)的交通信号控制方法的状态需要人为设计,导致提取交通状态信息难度大以及交通状态信息无法全面表达的问题,为了从有限特征中挖掘潜在交通状态信息,从而降低交通状态设计难度,提出一种引入自注意力网络的DRL算法.首先,仅获取交叉口各进入口车道车辆位置,使用非均匀量化和独热编码方法预处理得到车辆位置分布矩阵;其次,使用自注意力网络挖掘车辆位置分布矩阵的空间相关性和潜在信息,作为DRL算法的输入;最后,在单交叉口学习交通信号自适应控制策略,在多交叉口路网中验证所提算法的适应性和鲁棒性.仿真结果表明,在单交叉口环境下,与3种基准算法相比,所提算法在车辆平均等待时间等指标上具有更好的性能;在多交叉口路网中,所提算法仍然具有良好的适应性.
Traffic Signal Control with Deep Reinforcement Learning and Self-attention Mechanism
Traffic signal control(TSC)is still one of the most important research topics in the transportation field.The existing traffic signal control method based on deep reinforcement learning(DRL)needs to be designed manually,and it is often difficult to extract the complete traffic state information in the real operations.This paper proposes a DRL algorithm based on the self-attention network for the traffic signal control to analyze the potential traffic from limited traffic state information and reduce the difficulty of traffic state design.The vehicle position of each entry lane at the intersection is obtained,and the vehicle position distribution matrix is established through the non-uniform quantization and one-hot encoding method.The self-attention network is then used to analyze the spatial correlation and latent information of the vehicle location distribution matrix which is an input of the DRL algorithm.The traffic signal adaptive control strategy is trained at a single intersection and the adaptability and robustness of the proposed algorithm are verified in a multi-intersection road network.The simulation results show that in a single intersection environment,the proposed algorithm has better performance on the average vehicle delay and other indicators compared with three benchmark algorithms.The proposed algorithm also has good adaptability in the multi-intersection road network.

intelligent transportationadaptive controldeep reinforcement learningself-attention networkproximal policy optimization

张玺君、聂生元、李喆、张红

展开 >

兰州理工大学,计算机与通信学院,兰州 730050

智能交通 自适应控制 深度强化学习 自注意力网络 近端策略优化

国家自然科学基金甘肃省自然科学基金重点项目甘肃省高等学校创新基金项目

6216204022JR5RA2262021A-028

2024

交通运输系统工程与信息
中国系统工程学会

交通运输系统工程与信息

CSTPCD北大核心
影响因子:0.664
ISSN:1009-6744
年,卷(期):2024.24(2)
  • 21