首页|基于异构多智能体自注意力网络的路网信号协调顺序优化方法

基于异构多智能体自注意力网络的路网信号协调顺序优化方法

扫码查看
针对路网交通信号控制的复杂性,本文提出基于异构多智能体自注意力网络的路网信号协调顺序优化方法,提升路网范围内多交叉口信号控制策略性能。首先,模型考虑多交叉口交通流的空间相关性,采用基于自注意力机制的价值编码器学习交通观测表征,实现路网级通信;其次,面向多智能体策略更新的非稳态环境,模型在前序智能体的联合动作基础上,基于多智能体优势分解的策略解码器,顺序决策最优反应动作;最后,设计基于有效行驶车辆的动作掩码机制,在时效完备区间自适应调节决策频率,并提出考虑等待公平性的时空压力奖励函数,进一步提高策略性能与实用性。在杭州路网数据集上验证模型有效性,结果表明:所提模型在2个数据集和5个性能指标上均优于基准模型;相比最优基准模型,所提模型平均行程时间降低10。89%,平均排队长度降低18。84%,平均等待时间降低22。21%。此外,所提模型的泛化能力更强,且显著减少车辆等待时间过长的情形。
Coordinated Sequential Optimization for Network-wide Traffic Signal Control Based on Heterogeneous Multi-agent Transformer
Focusing on the complex traffic signal control task in an urban network,this study proposes a coordinated sequential optimization method based on a Heterogeneous Multi-Agent Transformer(HMATLight)to optimize network-wide traffic signals and improve the performance of signal control policy at intersections within the urban network.Specifically,considering the spatial correlation of multi-intersection traffic flow,a value encoder based on a self-attention mechanism is first designed to learn traffic observation representations and realize network-level communication.Secondly,in response to the non-stationary environment for multi-agent policy updates,a policy decoder based on the multi-agent advantage decomposition is constructed,which can sequentially output the optimal responsive action on the basis of the joint actions of preceding agents.Besides,an action-masking mechanism based on effective driving vehicles,adapting the decision frequency within the time-adequate interval,and a spatio-temporal pressure reward function considering the waiting fairness are constructed,which further enhance policy performance and practicality.A series of experiments are carried out on Hangzhou network datasets to validate the effectiveness of the proposed method.Experimental results show that the proposed HMATLight outperforms all baselines on two datasets with five metrics.Compared with the best-performed baseline,HMATLight decreases the average travel time by 10.89%,the average queue length by 18.84%and the average waiting time by 22.21%.Furthermore,HMATLight is dramatically higher in generalization and significantly reduces instances of long vehicle waiting times.

intelligent transportationdeep reinforcement learningnetwork-wide traffic signal controlheterogeneous multi-agentspatio-temporal pressure reward

陈喜群、朱奕璋、谢宁珂、耿茂思、吕朝锋

展开 >

浙江大学建筑工程学院,智能交通研究所,杭州 310058

浙江大学工程师学院,智能交通研究所,杭州 310058

浙江大学建筑工程学院,杭州 310058

智能交通 深度强化学习 路网信号控制 异构多智能体 时空压力奖励

国家自然科学基金

72171210

2024

交通运输系统工程与信息
中国系统工程学会

交通运输系统工程与信息

CSTPCD北大核心
影响因子:0.664
ISSN:1009-6744
年,卷(期):2024.24(3)
  • 1