首页|基于全局状态预测与公平经验重放的交通信号控制算法

基于全局状态预测与公平经验重放的交通信号控制算法

扫码查看
为了应对交通拥堵而设计的高效交通信号控制算法能提升现有交通网络下的车辆通行效率.尽管深度强化学习算法在单路口交通信号控制问题上已展现出卓越的性能,然而这些算法在多路口环境下的应用仍然面临着重大的挑战——多智能体强化学习(MARL)算法产生的时间和空间的部分可观测性引发的非平稳性问题会导致这些算法无法稳定的收敛.因此,提出一种基于全局状态预测与公平经验重放的多路口交通信号控制算法IS-DQN.一方面,基于不同车道的车流历史信息预测多路口的全局状态,从而扩展IS-DQN的状态空间,以避免算法产生空间部分可观测性而带来非平稳性问题;另一方面,为应对传统经验重放策略的时间部分可观测性,采用蓄水池抽样算法来保证经验重放池的公正性,进而避免其中的非平稳性问题.在复杂的多路口环境下应用IS-DQN算法进行3种不同的交通压力仿真实验的结果表明:在不同交通流情况下,尤其是在中低交通流量下,相较于独立的深度强化学习算法,IS-DQN算法能得到更短的车辆平均行驶时间,并表现出了更优的收敛性能与收敛稳定性.
Traffic signal control algorithm based on overall state prediction and fair experience replay
In order to cope with traffic congestion,efficient traffic signal control algorithms have been designed,which can improve the traffic efficiency of vehicles in the existing transportation network significantly.Although deep reinforcement learning algorithms have shown excellent performance in single intersection traffic signal control problems,their application in multi-intersection environments still faces major challenge—the non-stationarity problem caused by the spatiotemporal partial observability generated by Multi-Agent Reinforcement Learning(MARL)algorithm,resulting in that the deep reinforcement learning algorithms cannot guarantee stable convergence.To this end,a multi-intersection traffic signal control algorithm based on overall state prediction and fair experience replay—IS-DQN was proposed.For one thing,to avoid the problem of non-stationarity caused by spatial observability in algorithm,the state space of IS-DQN was expanded by predicting the overall state of multiple intersections based on historical traffic flow information from different lanes.For another,in order to cope with the time partial observability brought by traditional experience replay strategies,a reservoir sampling algorithm was adopted to ensure the fairness of experience replay pool,so as to avoid non-stationary problems in it.Experimental results on three different traffic pressure simulations in complex multi-intersection environments show that under different traffic pressure conditions,especially in low and medium traffic flow conditions,IS-DQN algorithm has lower average vehicle driving time,better convergence performance and convergence stability compared to independent deep reinforcement learning algorithms.

deep reinforcement learningtraffic signal controltime series predictionreservoir sampling algorithmLong Short-Term Memory(LSTM)

缪孜珺、罗飞、丁炜超、董文波

展开 >

华东理工大学 信息科学与工程学院,上海 200237

深度强化学习 交通信号控制 时序预测 蓄水池抽样算法 长短期记忆

2025

计算机应用
中国科学院成都计算机应用研究所

计算机应用

北大核心
影响因子:0.892
ISSN:1001-9081
年,卷(期):2025.45(1)