基于自注意力机制的深度强化学习交通信号控制

Traffic Signal Control with Deep Reinforcement Learning and Self-attention Mechanism

张玺君 ¹聂生元 ¹李喆 ¹张红¹

扫码查看

作者信息

1. 兰州理工大学,计算机与通信学院,兰州 730050
折叠

摘要

交通信号控制(Traffic Signal Control,TSC)仍然是交通领域中最重要的研究课题之一.针对现有基于深度强化学习(Deep Reinforcement Learning,DRL)的交通信号控制方法的状态需要人为设计,导致提取交通状态信息难度大以及交通状态信息无法全面表达的问题,为了从有限特征中挖掘潜在交通状态信息,从而降低交通状态设计难度,提出一种引入自注意力网络的DRL算法.首先,仅获取交叉口各进入口车道车辆位置,使用非均匀量化和独热编码方法预处理得到车辆位置分布矩阵;其次,使用自注意力网络挖掘车辆位置分布矩阵的空间相关性和潜在信息,作为DRL算法的输入;最后,在单交叉口学习交通信号自适应控制策略,在多交叉口路网中验证所提算法的适应性和鲁棒性.仿真结果表明,在单交叉口环境下,与3种基准算法相比,所提算法在车辆平均等待时间等指标上具有更好的性能;在多交叉口路网中,所提算法仍然具有良好的适应性.

Abstract

Traffic signal control(TSC)is still one of the most important research topics in the transportation field.The existing traffic signal control method based on deep reinforcement learning(DRL)needs to be designed manually,and it is often difficult to extract the complete traffic state information in the real operations.This paper proposes a DRL algorithm based on the self-attention network for the traffic signal control to analyze the potential traffic from limited traffic state information and reduce the difficulty of traffic state design.The vehicle position of each entry lane at the intersection is obtained,and the vehicle position distribution matrix is established through the non-uniform quantization and one-hot encoding method.The self-attention network is then used to analyze the spatial correlation and latent information of the vehicle location distribution matrix which is an input of the DRL algorithm.The traffic signal adaptive control strategy is trained at a single intersection and the adaptability and robustness of the proposed algorithm are verified in a multi-intersection road network.The simulation results show that in a single intersection environment,the proposed algorithm has better performance on the average vehicle delay and other indicators compared with three benchmark algorithms.The proposed algorithm also has good adaptability in the multi-intersection road network.

关键词

智能交通/自适应控制/深度强化学习/自注意力网络/近端策略优化

Key words

intelligent transportation/adaptive control/deep reinforcement learning/self-attention network/proximal policy optimization

引用本文复制引用

基金项目

国家自然科学基金(62162040)

甘肃省自然科学基金重点项目(22JR5RA226)

甘肃省高等学校创新基金项目(2021A-028)

出版年

2024

交通运输系统工程与信息

中国系统工程学会

交通运输系统工程与信息

CSTPCD北大核心

影响因子：0.664

ISSN：1009-6744

参考文献量21

段落导航