首页|基于Stacking集成学习的脱贫人口返贫风险预测方法研究

基于Stacking集成学习的脱贫人口返贫风险预测方法研究

扫码查看
脱贫人口的返贫风险是影响脱贫攻坚成果与乡村振兴有效衔接的主要因素,精准预测脱贫人口的潜在返贫风险,对于指导政策落实、资源分配和风险评估具有至关重要的作用。本文提出一种基于Stacking集成学习的脱贫人口返贫风险预测方法,以H省脱贫户脱敏后的监测数据为研究对象,对数据特征进行相关性分析及重要性排序,识别并筛选显著影响返贫风险的关键特征;基于关键特征数据对随机森林(Random forest,RF)、朴素贝叶斯(Naive bayes,NB)、支持向量机(Support vector machine,SVM)等独立模型进行模型间的相关性分析,以相关性较低且预测准确率较高的极限梯度提升树(eXtreme gradient boosting,XGBoost)、自适应提升算法(Adaptive boosting,adaBoost)、SVM作为基础学习器,RF作为元学习器构建了Stacking集成学习预测模型。将 412 919 条数据以 7∶3 划分成训练集和验证集对模型进行训练和验证,并使用准确率、精确率、召回率和F1-Score评价模型效果。实验结果表明,基于Stacking集成学习的返贫风险预测模型各项评价指标均优于单一模型,其预测准确率与RF、NB、SVM、XGBoost、AdaBoost相比分别提升3。64%、10。96%、3。15%、2。29%和 5。41%,最终达到了 95。65%,验证了本文所提方法的有效性。该研究为巩固和拓展脱贫攻坚成果,提升返贫动态监测预警时效提供了新的解决思路。
Research on prediction method of poverty-returning risk for poverty alleviation population base on stacking ensemble learning
The poverty-returning risk of the poverty alleviation population is a major factor on the results of poverty eradication and rural revitalization.Accurate prediction of the potential poverty-returning risk of the poverty alleviation population plays a crucial role in guiding the implementation of policies,allocation of resources,and risk assessment.This paper proposed a prediction method based on Stacking ensemble learning for the poverty-returning risk of the poverty alleviation population.taking The monitoring data after desensitization of the poverty alleviation households in Province H was analyzed to identify and filter the key features that significantly affect the poverty-returning risk after correlation analysis and importance ranking of the data features,whose key features were adopted in inter-model correlation analysis of the independent models such as Random Forest(RF),Naive Bayes(NB),Support Vector Machine(SVM),etc.The Stacking ensemble learning prediction model was conducted with RF meta-learner using eXtreme Gradient Boosting(XGBoost),Adaptive Boosting(AdaBoost)and SVM that have lower correlation and higher prediction accuracy.The model was trained and validated by dividing 412 919 data into training and validation sets in 7:3,and the model effect was evaluated using accuracy,precision,recall and F1-Score.The experimental results showed that all evaluation indexes of the poverty-returning risk prediction model based on Stacking ensemble learning were better than that of a single model,and its prediction accuracy was improved by 3.64%,10.96%,3.15%,2.29%,and 5.41%compared with RF,NB,SVM,XGBoost,and AdaBoost,respectively,and finally reached 95.65%,which verified the effectiveness of the method proposed in this paper.The study provided new solution ideas for consolidating and expanding the results of poverty eradication and improving the timeliness of returning to poverty dynamic monitoring and warning.

Stacking ensemble learningpoverty-returning risk predictionmachine learningfeature selectioncorrelation analysis

刘红达、孙小华、王斌、王超、王福顺

展开 >

河北农业大学 信息科学与技术学院,河北 保定 071001

河北省农业大数据重点实验室,河北 保定 071000

河北软件职业技术学院,河北 保定 071000

Stacking集成学习 返贫风险预测 机器学习 特征选择 相关性分析

2024

河北农业大学学报
河北农业大学

河北农业大学学报

CSTPCD北大核心
影响因子:0.475
ISSN:1000-1573
年,卷(期):2024.47(6)