首页|基于强化学习的公交站场服务中断防治策略

基于强化学习的公交站场服务中断防治策略

扫码查看
为缓解公交站场的服务中断问题,提出一种基于强化学习的动态发车控制策略。策略利用长短期记忆(LSTM)模型对公交行程时间进行预测,使智能体感知站场车辆与运行车辆的车头时距状态,以更好地评估决策的长期影响。针对站场无车可发的场景,在计算动作概率分布时应用状态相关可微函数将无效动作遮蔽,避免智能体下发无效指令。通过奖励函数对大发车间隔进行惩罚,并使用近端策略优化(PPO)对模型进行训练。仿真结果表明,与传统方法相比,所提方法不仅能有效避免公交站场服务中断,而且使车辆载客率更均衡,乘客等待时间更少,车辆利用效率更高。
A Resistance Strategy for Bus Service Disruption in Depot Based on Reinforcement Learning
In order to alleviate the problem of bus service disruption in depot,this paper proposes a dynamic de-parture control strategy based on reinforcement learning.This strategy uses a long short-term memory(LSTM)model to predict bus travel time,so that the agent can perceive the headway status of the depot vehicle and the running vehi-cle to better evaluate the long-term impact of the decision made by the agent.For the scenario where there is no bus stop at the depot,the state-dependent differentiable function is used to mask invalid actions when calculating the ac-tion probability distribution,so as to avoid invalid commands from the agent.The model is trained using proximal poli-cy optimization(PPO)and penalizes large departure intervals through a reward function.The experimental result shows that,compared with the traditional method,the method proposed in this paper can not only effectively avoid the bus service disruption in the depot,but also make the bus passenger load ratio more balanced,the passenger waiting time shorter,and the vehicle utilization efficiency higher.

Bus service disruptionReal-Time controlReinforcement learningProximal policy optimizationInvalid action masking

伦嘉铭、姜海明、谢康

展开 >

广东工业大学机电工程学院,广东 广州 510006

公交服务中断 实时控制 强化学习 近端策略优化 无效动作遮蔽

国家自然科学基金广东省"领军人才"项目

11874126400180001

2024

计算机仿真
中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD
影响因子:0.518
ISSN:1006-9348
年,卷(期):2024.41(4)
  • 21