基于强化学习的高铁列车运行图编制模型优化方法研究
Optimization Method of Train Working Diagram Compilation Model of High Speed Railways Based on Reinforcement Learning
范文天 1曾勇程 2郭一唯 3杨宁 4张海峰5
作者信息
- 1. 南京邮电大学 计算机学院,江苏 南京 210003;中科南京人工智能创新研究院,江苏 南京 211135;中国科学院大学南京学院 信息学院,江苏 南京 211135
- 2. 中国科学院 自动化研究所,北京 100190;中国科学院大学 人工智能学院,北京 100190
- 3. 中国铁路列车运行图技术中心,北京 100081;中国铁道科学研究院集团有限公司 运输及经济研究所,北京 100081
- 4. 中国科学院 自动化研究所,北京 100190
- 5. 中科南京人工智能创新研究院,江苏 南京 211135;中国科学院 自动化研究所,北京 100190;中国科学院大学 人工智能学院,北京 100190
- 折叠
摘要
针对高铁列车运行图中可能存在的停站时间超出范围、运行时间超出范围、超车和间隔时间不足这4类冲突,基于强化学习理论,实现一个用于调解列车运行图冲突的智能体.通过建立列车运行图编制环境,研究设计用于调解不同冲突的算子集,利用近端策略优化算法在搭建好的环境中训练智能体.为提升算法性能,采用启发式贪心算法采集样本对网络进行监督学习作为前期预训练,利用熵增加算法的探索力度和多策略决策让最终的调解方案更加有效,并使用模型预热让算法网络在每个测试环境中进行参数微调以适应新环境.结果表明,在相同初始环境下,该方法消解所有冲突所需步骤显著少于启发式贪心算法,且100%消解所有冲突的概率远大于启发式贪心算法,该方法为列车运行图编制模型提供了新的参考.
Abstract
To address four types of conflicts that may exist in the train working diagram of high speed railways,such as stop timeout,long-time running,overtaking,and insufficient interval time,this paper implemented an agent to resolve train working diagram conflicts based on reinforcement learning theory.By establishing a train working diagram compilation environment,the research designed an operator set for resolving different conflicts and trained the agent in the constructed environment by using the proximal policy optimization(PPO)algorithm.To enhance algorithm performance,a heuristic greedy algorithm was used to collect samples for supervised learning of networks as initial pre-training.The entropy-increase algorithm was employed to intensify exploration,and multi-policy decision making was utilized to make the final resolution more effective.Model pre-warming was performed to fine-tune the algorithm network parameters in each test environment to adapt to new conditions.The results show that under the same initial conditions,the number of steps required by the proposed method to resolve all conflicts is significantly less than that of steps required by the heuristic greedy algorithm,and the probability of completely resolving all conflicts by the proposed method is much greater than that by the heuristic greedy algorithm.This method provides a new reference for the train working diagram compilation model.
关键词
列车运行图/强化学习/PPO算法/冲突调解/启发式贪心算法Key words
Train Working Diagram/Reinforcement Learning/PPO Algorithm/Conflict Resolution/Heuristic Greedy Algorithm引用本文复制引用
出版年
2025