An Interpretable Machine Learning Framework-based Approach for Predicting Passenger Flow Distribution in Train Riding Sections
In order to clarify the impact of railway passenger transportation services on the prediction of passenger flow distribution,we propose a method based on an interpretable machine learning framework to predict passenger flow distribution in high-speed railway sections.First,we propose a framework capable of predicting passenger flow distribution in sections by using gradient-boosted tree models.Meanwhile,we construct different gradient-boosted tree models,including GBDT,XGBoost,LightGBM,and CatBoost.Secondly,the importance of feature contributions and feature variables are calculated using the SHapley Additive exPlanations(SHAP)method.A non-linear relationship between different features and passenger flow distribution is revealed.The experiment from Beijing South to Shanghai Hongqiao shows that all four models accurately predict the distribution.The coefficients of determination for GBDT,XGBoost,LightGBM,and CatBoost in the test set are 0.9664,0.9601,0.9680,and 0.9715 respectively.After optimizing the features,the order of importance in the contribution is as follows:benchmark train,ticket price,travel time,date,day of the week,and train code departure time.The coefficient of determination for the CatBoost-7 model in the validation set after feature optimization is 0.9458.Both the date and the benchmark train show a non-linear positive correlation with the passenger flow distribution prediction,while the travel time shows a non-linear negative correlation.In addition,low travel time,high ticket price and the benchmark train departing exactly at the scheduled departure time positively influence the passenger flow distribution prediction.This study provides valuable insights into the design of high-speed rail passenger transportation services.
railway transportationpassenger flow distribution forecastinterpretable machine learningtrain-riding segmentsnon-linear relationship