Online shopping intention prediction based on cost-sensitive LightGBM
[Objective]The research of online shopping intention prediction is a typical unbalanced data classification problem.The number of consumers buying goods is much smaller than the number of consumers not buying goods.The purpose of this paper is to solve the problem that the recognition accuracy of minority samples is much lower than that of majority samples.[Methods]This paper proposes a cost-sensitive LightGBM(light gradient boosting machine)model based on Bayes optimization.Firstly,the misclassification cost is introduced as a penalty factor to modify the loss function of LightGBM.Secondly,the classification threshold of the model is reduced by threshold shifting to improve the prediction accuracy of minority samples.Finally,the parameters of misclassification cost,classification threshold and other parameters are optimized by Bayes optimization algorithm.[Results]Five typical unbalanced datasets are selected from the KEEL database.To verify the effectiveness of the improved LightGBM algorithm proposed in this paper,the improved LightGBM algorithm is compared with standard LightGBM algorithm,genetic algorithm optimization cost-sensitive LightGBM algorithm,particle swarm optimization cost-sensitive LightGBM algorithm,ADASYN-LightGBM(adaptive synthetic sampling approach)algorithm,BorderlineSMOTE-LightGBM(borderline synthetic minority oversampling technique)algorithm,respectively.The AUC(area under curve)and G-mean(geometric mean)are used as evaluation indexes to evaluate the performance of the model,and the final experimental results are obtained after 100 iterations and cross-validation with ten folds.Compared with the standard LightGBM model,the AUC value and G-mean value of the cost-sensitive LightGBM model have both increased by about 10%,indicating that the introduction of cost-sensitive learning has significantly improved the classification performance of LightGBM model,and can better deal with unbalanced data classification problems.Compared with genetic algorithm optimization cost-sensitive LightGBM model and particle swarm optimization cost-sensitive LightGBM model,the AUC value and G-mean value of Bayes optimization cost-sensitive LightGBM model are generally increased by about 4%.It shows that Bayes optimization has certain advantages in parameter optimization of cost-sensitive LightGBM algorithm.Compared with ADASYN-LightGBM model and BorderlineSMOTE-LightGBM model,the AUC value and G-mean value of Bayes optimization cost-sensitive LightGBM model are generally increased by about 3%.The results show that Bayes optimization cost-sensitive LightGBM model is better than the combination of two sample sampling methods and LightGBM model in the classification of unbalanced data.To verify the validity of the prediction model of consumers'online shopping intention based on Bayes optimization cost-sensitive LightGBM,the paper selects the consumer behavior data provided by Jingdong platform.The data is the historical interaction behavior records of consumers,commodities,categories and stores provided in Jingdong JDATA algorithm competition from February 1,2018 to April 15,2018.The final experimental results are G-mean value of 0.913,AUC value of 0.920 and F1 value of 0.692.Compared with the prediction results of the other two studies on the same dataset,the prediction model of online shopping intention based on Bayes optimization cost-sensitive LightGBM has better performance.[Conclusions]Aiming at the problem of unbalanced data in online shopping intention research,the paper proposes a prediction model of online shopping intention based on Bayes optimization cost-sensitive LightGBM.Based on cost-sensitive learning,the classification error cost is added to LightGBM loss function as a penalty factor,and the classification threshold of the model is reduced by moving the threshold to improve the prediction accuracy for minority samples.The classification error cost parameters,classification threshold and other parameters in the cost-sensitive LightGBM model were optimized by using Bayes optimization algorithm.Experimental results on the KEEL dataset show that:compared with standard LightGBM,genetic algorithm optimization cost-sensitive LightGBM,particle swarm optimization cost-sensitive LightGBM,ADASYN-LightGBM and BorderlineSMOTE-LightGBM models,Bayes optimization cost-sensitive LightGBM model has certain advantages and effectiveness in dealing with unbalanced data problems.The empirical results on Jingdong consumer behavior dataset show that Bayes optimization cost-sensitive LightGBM model can better predict consumers'online shopping intentions.