基于可解释机器学习框架的列车乘车区段客流分布预测方法

扫码查看

原文链接

NETL
NSTL
万方数据
维普

中文摘要：为解释客运产品特征对列车乘车区段客流分布预测的影响,本文提出一种基于可解释机器学习框架的高速铁路列车乘车区段客流分布预测方法.首先,提出基于梯度提升树模型的高速铁路列车乘车区段客流分布预测框架,构建不同梯度提升树模型(GBDT、XGBoost、LightGBM及CatBoost)的高速铁路列车乘车区段客流分布预测模型;其次,计算特征贡献重要度,基于SHAP(SHapley Additive exPlanations)方法实现特征变量优化,揭示单一特征和交互特征与列车乘车区段客流分布预测的非线性关系.北京南—上海虹桥间列车客流分布预测结果表明:4种模型可精准预测客流分布结果,GBDT,XGBoost,LightGBM及CatBoost在测试集的决定系数分别为0.9664,0.9601,0.9680及0.9715;特征优化后,按贡献重要度排序依次为标杆车,票价,旅行时间,日期,星期,车次及出发时间;特征优化后,CatBoost-7模型在验证集中的决定系数为0.9458;日期和标杆车对客流分布预测呈现非线性正相关,旅行时间对客流分布预测呈现非线性负相关,低旅行时间、高票价及出发时间整点的标杆车对客流分布预测产生正向影响.本文研究结果能够为高速铁路客运产品设计提供一定参考价值.

外文标题：An Interpretable Machine Learning Framework-based Approach for Predicting Passenger Flow Distribution in Train Riding Sections

外文摘要：In order to clarify the impact of railway passenger transportation services on the prediction of passenger flow distribution,we propose a method based on an interpretable machine learning framework to predict passenger flow distribution in high-speed railway sections.First,we propose a framework capable of predicting passenger flow distribution in sections by using gradient-boosted tree models.Meanwhile,we construct different gradient-boosted tree models,including GBDT,XGBoost,LightGBM,and CatBoost.Secondly,the importance of feature contributions and feature variables are calculated using the SHapley Additive exPlanations(SHAP)method.A non-linear relationship between different features and passenger flow distribution is revealed.The experiment from Beijing South to Shanghai Hongqiao shows that all four models accurately predict the distribution.The coefficients of determination for GBDT,XGBoost,LightGBM,and CatBoost in the test set are 0.9664,0.9601,0.9680,and 0.9715 respectively.After optimizing the features,the order of importance in the contribution is as follows:benchmark train,ticket price,travel time,date,day of the week,and train code departure time.The coefficient of determination for the CatBoost-7 model in the validation set after feature optimization is 0.9458.Both the date and the benchmark train show a non-linear positive correlation with the passenger flow distribution prediction,while the travel time shows a non-linear negative correlation.In addition,low travel time,high ticket price and the benchmark train departing exactly at the scheduled departure time positively influence the passenger flow distribution prediction.This study provides valuable insights into the design of high-speed rail passenger transportation services.

外文关键词：

railway transportationpassenger flow distribution forecastinterpretable machine learningtrain-riding segmentsnon-linear relationship

作者：

孙国锋、景云、李和壁、田志强、田小鹏

展开 >

作者单位：

北京交通大学,交通运输学院,北京 100044

北京交通大学,智慧高速铁路系统前沿科学中心,北京 100044

中国铁道科学研究院集团有限公司,铁道科学技术研究发展中心,北京 100081

兰州交通大学,交通运输学院,兰州 730070

兰州交通大学,高原铁路运输智慧管控铁路行业重点实验室,兰州 730070

展开 >

关键词：

铁路运输客流分布预测可解释机器学习列车乘车区段非线性关系

基金：

国家自然科学基金国家自然科学基金中央高校基本科研业务费专项资金

项目编号：

52372300721610232023YJS146

出版年：

2024

DOI：

10.16097/j.cnki.1009-6744.2024.02.025

交通运输系统工程与信息

中国系统工程学会

交通运输系统工程与信息

CSTPCD北大核心

影响因子：0.664

ISSN：1009-6744

年,卷(期)：2024.24(2)

参考文献量26