Optimization Method of Train Working Diagram Compilation Model of High Speed Railways Based on Reinforcement Learning
To address four types of conflicts that may exist in the train working diagram of high speed railways,such as stop timeout,long-time running,overtaking,and insufficient interval time,this paper implemented an agent to resolve train working diagram conflicts based on reinforcement learning theory.By establishing a train working diagram compilation environment,the research designed an operator set for resolving different conflicts and trained the agent in the constructed environment by using the proximal policy optimization(PPO)algorithm.To enhance algorithm performance,a heuristic greedy algorithm was used to collect samples for supervised learning of networks as initial pre-training.The entropy-increase algorithm was employed to intensify exploration,and multi-policy decision making was utilized to make the final resolution more effective.Model pre-warming was performed to fine-tune the algorithm network parameters in each test environment to adapt to new conditions.The results show that under the same initial conditions,the number of steps required by the proposed method to resolve all conflicts is significantly less than that of steps required by the heuristic greedy algorithm,and the probability of completely resolving all conflicts by the proposed method is much greater than that by the heuristic greedy algorithm.This method provides a new reference for the train working diagram compilation model.
Train Working DiagramReinforcement LearningPPO AlgorithmConflict ResolutionHeuristic Greedy Algorithm