基于三重混合采样和集成学习的潜在高价值旅客发现
POTENTIAL HIGH-VALUE PASSENGER DISCOVERY BASED ON SSOMAJ-SMOTE-SSOMIN SAMPLING AND ENSEMBLE LEARNING
冯霞 1胡昉2
作者信息
- 1. 中国民航大学计算机科学与技术学院 天津 300300;中国民航信息技术科研基地 天津 300300
- 2. 中国民航大学计算机科学与技术学院 天津 300300
- 折叠
摘要
考虑潜在高价值旅客特有的数据高度不平衡、旅客特征和价值类别弱相关等问题,提出一种基于三重混合采样和集成学习的潜在高价值旅客发现模型.采用RFM(Recency Frequency Monetary)方法标注旅客类别;使用三重混合采样对不平衡旅客数据集进行重采样;使用融合特征选择算法遴选旅客特征;使用梯度提升决策树作为分类器,构建旅客价值预测模型,识别潜在高价值旅客.在PNR数据集上的实验结果表明,与基准算法相比,该模型能取得更好的AUC值和F1值,可以较好地识别潜在高价值旅客.
Abstract
Considering highly-imbalanced data and weak correlation between passenger characteristics and value categories of potential high-value passenger,a potential high-value passenger discovery model based on SSOMaj-SMOTE-SSOMin sampling and ensemble learning is proposed.The RFM method was used to label the passenger category.The SSOMaj-SMOTE-SSOMin method was used to resample the imbalanced passenger data set.The fusion feature selection algorithm(FFS)was used to select the passenger features.Gradient boosting decision tree(GBDT)was taken as the classifier to build a passenger value prediction model to identify potential high-value passengers.Compared with the baseline algorithm,the experimental results on the PNR data set show that the proposed model achieves better AUC value and F1 value,and can better identify potential high-value passengers.
关键词
航空运输/三重混合采样/特征重要性排序/潜在高价值旅客/不平衡分类/集成学习Key words
Air transportation/SSOMaj-SMOTE-SSOMin/Feature importance ranking/Potential high value passenger/Imbalanced classification/Ensemble learning引用本文复制引用
基金项目
国家自然科学基金项目(61502499)
中国民航大学科研基金项目(2013QD18X)
民航旅客服务智能化应用技术重点实验室项目()
出版年
2024