考虑潜在高价值旅客特有的数据高度不平衡、旅客特征和价值类别弱相关等问题,提出一种基于三重混合采样和集成学习的潜在高价值旅客发现模型.采用RFM(Recency Frequency Monetary)方法标注旅客类别;使用三重混合采样对不平衡旅客数据集进行重采样;使用融合特征选择算法遴选旅客特征;使用梯度提升决策树作为分类器,构建旅客价值预测模型,识别潜在高价值旅客.在PNR数据集上的实验结果表明,与基准算法相比,该模型能取得更好的AUC值和F1值,可以较好地识别潜在高价值旅客.
POTENTIAL HIGH-VALUE PASSENGER DISCOVERY BASED ON SSOMAJ-SMOTE-SSOMIN SAMPLING AND ENSEMBLE LEARNING
Considering highly-imbalanced data and weak correlation between passenger characteristics and value categories of potential high-value passenger,a potential high-value passenger discovery model based on SSOMaj-SMOTE-SSOMin sampling and ensemble learning is proposed.The RFM method was used to label the passenger category.The SSOMaj-SMOTE-SSOMin method was used to resample the imbalanced passenger data set.The fusion feature selection algorithm(FFS)was used to select the passenger features.Gradient boosting decision tree(GBDT)was taken as the classifier to build a passenger value prediction model to identify potential high-value passengers.Compared with the baseline algorithm,the experimental results on the PNR data set show that the proposed model achieves better AUC value and F1 value,and can better identify potential high-value passengers.
Air transportationSSOMaj-SMOTE-SSOMinFeature importance rankingPotential high value passengerImbalanced classificationEnsemble learning