基于强化学习的公交站场服务中断防治策略

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：为缓解公交站场的服务中断问题,提出一种基于强化学习的动态发车控制策略.策略利用长短期记忆(LSTM)模型对公交行程时间进行预测,使智能体感知站场车辆与运行车辆的车头时距状态,以更好地评估决策的长期影响.针对站场无车可发的场景,在计算动作概率分布时应用状态相关可微函数将无效动作遮蔽,避免智能体下发无效指令.通过奖励函数对大发车间隔进行惩罚,并使用近端策略优化(PPO)对模型进行训练.仿真结果表明,与传统方法相比,所提方法不仅能有效避免公交站场服务中断,而且使车辆载客率更均衡,乘客等待时间更少,车辆利用效率更高.

外文标题：A Resistance Strategy for Bus Service Disruption in Depot Based on Reinforcement Learning

外文摘要：In order to alleviate the problem of bus service disruption in depot,this paper proposes a dynamic de-parture control strategy based on reinforcement learning.This strategy uses a long short-term memory(LSTM)model to predict bus travel time,so that the agent can perceive the headway status of the depot vehicle and the running vehi-cle to better evaluate the long-term impact of the decision made by the agent.For the scenario where there is no bus stop at the depot,the state-dependent differentiable function is used to mask invalid actions when calculating the ac-tion probability distribution,so as to avoid invalid commands from the agent.The model is trained using proximal poli-cy optimization(PPO)and penalizes large departure intervals through a reward function.The experimental result shows that,compared with the traditional method,the method proposed in this paper can not only effectively avoid the bus service disruption in the depot,but also make the bus passenger load ratio more balanced,the passenger waiting time shorter,and the vehicle utilization efficiency higher.

外文关键词：

Bus service disruptionReal-Time controlReinforcement learningProximal policy optimizationInvalid action masking

作者：

伦嘉铭、姜海明、谢康

展开 >

作者单位：

广东工业大学机电工程学院,广东广州 510006

关键词：

公交服务中断实时控制强化学习近端策略优化无效动作遮蔽

基金：

国家自然科学基金广东省"领军人才"项目

项目编号：

11874126400180001

出版年：

2024

计算机仿真

中国航天科工集团公司第十七研究所

计算机仿真

CSTPCD

影响因子：0.518

ISSN：1006-9348

年,卷(期)：2024.41(4)

参考文献量21