首页|基于MADDPG算法的匝道合流区多车协同控制

基于MADDPG算法的匝道合流区多车协同控制

扫码查看
为了保障匝道合流区的安全高效通行,提出了一种基于多智能体强化学习算法的多车协同控制方法.以提升系统计算效率为目标,设计了基于多智能体确定性策略梯度算法(MADDPG)的分布式训练框架;针对智能体模型难以应对连续车流场景的问题,通过构建相对静止环境,改进策略更新梯度,保障智能体面向连续车流环境的平稳性;拆分匝道合流区场景为准备区和汇入区,分别依据两区域控制目标设计了状态、动作空间及奖励函数.结果表明:在不同交通流量下,与基于规则的方法相比,该方法通行合流区的总延误时间平均缩短25.46%;与全局优化方法相比,延误时间相差8.47%,但控制时长上不会随车辆数量增加而增长.该文所提出匝道合流区多车协同控制方法能够更好地兼顾通行效率提升与系统实时性.
Multi-vehicle cooperative control in ramp merging area based on MADDPG algorithm
A multi-vehicle cooperative control method based on the multi-agent reinforcement learning algorithm was proposed to ensure the safety and efficiency of the ramp merging area.A distributed training framework based on the Multi-Agent Deep Deterministic Policy Gradient(MADDPG)algorithm was designed with the goal of enhancing the computational efficiency of the system;In response to the challenge of the agent model dealing with continuous traffic flow scenarios,the stability of the agent towards the continuous traffic flow environment was guaranteed by constructing a relatively stationary environment and improving the strategy update gradient.The ramp merging area scenario was split into a preparation area and an entry area,and according to the control objectives of the two areas,the state and action spaces and reward functions were designed separately.The results show that,under different traffic flows,the proposed method reduces the total delay time in the merging area by an average of 25.46%comparing with the rule-based method,the delay time difference is 8.47%comparing with the global optimization method,but the control duration does not increase with the number of vehicles.Therefore,the proposed multi-vehicle cooperative control method for the ramp merging area can better balance the improvement of traffic efficiency and the real-time performance of the system.

multi-agent deep deterministic policy gradient(MADDPG)multi-agent reinforcement learningmulti-vehicle cooperative controlramp merging

蔡田茂、孔伟伟、罗禹贡、石佳、姬鹏霄、李聪民

展开 >

中国农业大学 工学院,北京 100083,中国

清华大学 车辆与运载学院,北京 100083,中国

多智能体确定性策略梯度算法(MADDPG) 多智能体强化学习 多车协同控制 匝道合流

2024

汽车安全与节能学报
清华大学

汽车安全与节能学报

CSTPCD北大核心
影响因子:0.748
ISSN:1676-8484
年,卷(期):2024.15(6)