基于强化学习的高铁列车运行图编制模型优化方法研究

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：针对高铁列车运行图中可能存在的停站时间超出范围、运行时间超出范围、超车和间隔时间不足这4类冲突,基于强化学习理论,实现一个用于调解列车运行图冲突的智能体.通过建立列车运行图编制环境,研究设计用于调解不同冲突的算子集,利用近端策略优化算法在搭建好的环境中训练智能体.为提升算法性能,采用启发式贪心算法采集样本对网络进行监督学习作为前期预训练,利用熵增加算法的探索力度和多策略决策让最终的调解方案更加有效,并使用模型预热让算法网络在每个测试环境中进行参数微调以适应新环境.结果表明,在相同初始环境下,该方法消解所有冲突所需步骤显著少于启发式贪心算法,且100%消解所有冲突的概率远大于启发式贪心算法,该方法为列车运行图编制模型提供了新的参考.

外文标题：Optimization Method of Train Working Diagram Compilation Model of High Speed Railways Based on Reinforcement Learning

外文摘要：To address four types of conflicts that may exist in the train working diagram of high speed railways,such as stop timeout,long-time running,overtaking,and insufficient interval time,this paper implemented an agent to resolve train working diagram conflicts based on reinforcement learning theory.By establishing a train working diagram compilation environment,the research designed an operator set for resolving different conflicts and trained the agent in the constructed environment by using the proximal policy optimization(PPO)algorithm.To enhance algorithm performance,a heuristic greedy algorithm was used to collect samples for supervised learning of networks as initial pre-training.The entropy-increase algorithm was employed to intensify exploration,and multi-policy decision making was utilized to make the final resolution more effective.Model pre-warming was performed to fine-tune the algorithm network parameters in each test environment to adapt to new conditions.The results show that under the same initial conditions,the number of steps required by the proposed method to resolve all conflicts is significantly less than that of steps required by the heuristic greedy algorithm,and the probability of completely resolving all conflicts by the proposed method is much greater than that by the heuristic greedy algorithm.This method provides a new reference for the train working diagram compilation model.

外文关键词：

Train Working DiagramReinforcement LearningPPO AlgorithmConflict ResolutionHeuristic Greedy Algorithm

作者：

范文天、曾勇程、郭一唯、杨宁、张海峰

展开 >

作者单位：

南京邮电大学计算机学院,江苏南京 210003

中科南京人工智能创新研究院,江苏南京 211135

中国科学院大学南京学院信息学院,江苏南京 211135

中国科学院自动化研究所,北京 100190

中国科学院大学人工智能学院,北京 100190

中国铁路列车运行图技术中心,北京 100081

中国铁道科学研究院集团有限公司运输及经济研究所,北京 100081

展开 >

关键词：

列车运行图强化学习 PPO算法冲突调解启发式贪心算法

出版年：

2025

DOI：

10.16668/j.cnki.issn.1003-1421.2025.01.07

铁道运输与经济

中国铁道科学研究院

铁道运输与经济

北大核心

影响因子：0.924

ISSN：1003-1421

年,卷(期)：2025.47(1)