地质科技通报2024,Vol.43Issue(3) :251-265.DOI:10.19509/j.cnki.dzkq.tb20230037

基于优化负样本采样策略的梯度提升决策树与随机森林的汶川同震滑坡易发性评价

Susceptibility evaluation of Wenchuan coseismic landslides by gradient boosting decision tree and random forest based on optimal negative sample sampling strategies

郭衍昊 窦杰 向子林 马豪 董傲男 罗万祺
地质科技通报2024,Vol.43Issue(3) :251-265.DOI:10.19509/j.cnki.dzkq.tb20230037

基于优化负样本采样策略的梯度提升决策树与随机森林的汶川同震滑坡易发性评价

Susceptibility evaluation of Wenchuan coseismic landslides by gradient boosting decision tree and random forest based on optimal negative sample sampling strategies

郭衍昊 1窦杰 1向子林 1马豪 1董傲男 1罗万祺1
扫码查看

作者信息

  • 1. 中国地质大学(武汉)湖北巴东地质灾害国家野外科学观测研究站,武汉 430074
  • 折叠

摘要

强震诱发的滑坡具有数量多、分布广、规模大等特点,严重威胁人民生命财产安全.滑坡易发性评价能够快速预测灾害空间分布,对于减轻震后灾害的危险性具有重要意义.在同震滑坡易发性评价研究中,如何选取滑坡负样本并通过耦合机器学习模型提高评价精度的对比研究仍需进一步研究.以山区汶川地震诱发的滑坡为研究区,首先选取地形地貌、地质环境、地震参数等10个滑坡评价因子,分析滑坡空间分布规律;其次因子共线性分析检验数据冗余,接下来采用频率比法(FR)选取极低、低易发区滑坡负样本点的采样策略;最后采用基于决策树演化改进的梯度提升决策树(GBDT)、随机森林(RF)和耦合模型(FR-GBD与FR-RF),开展了基于机器学习的同震滑坡易发性区划并进行精度评价.研究结果表明:①滑坡空间分布受到多层级因子控制;②模型预测精度为:FR-RF(AUC=0.943)>FR-GBDT(AUC=0.926)>RF(AUC=0.901)>GBDT(AUC=0.856);③在低易发区选择滑坡负样本可以明显提高易发性精度.研究成果可为滑坡易发性中负样本的选择和评价模型构建提供参考同时也为震后滑坡的防灾减灾提供理论支持.

Abstract

[Objective]Strong earthquake-induced landslides are characterized by large number,wide distribution and large scale,and seriously threaten people's lives and property.Landslide susceptibility mapping(LSM)can quickly predict the spatial distribution of prone areas,which is highly important for reducing the risk of post-earth-quake disasters.However,in the studies of coseismic landslide LSMs,how to select negative landslide samples and integrate machine learning models to improve the evaluation accuracy still needs further investigation.[Methods]In this study,the landslides induced by the Wenchuan earthquake in mountainous areas are selected as a case stud-y.First,10 landslide influencing factors,such as topography,geological environment,and seismic parameters,are selected to analyse the spatial distribution of landslides.Then,collinearity analysis is used to test data redun-dancy,nonnegative sample points from the sampling strategies are randomly selected in the extremely low suscepti-bility regions by the frequency ratio(FR)method.Finally,gradient boosting decision tree(GBDT),random for-est(RF),and their optimal models are used to predict coseismic landslide susceptibility,conduct a comparative study of the models and carry out an accuracy assessment.[Results]The results show that ① the spatial distribu-tion of landslides is controlled by multiple factors,and ② the accuracy of the models is FR-RF(AUC=0.943)>FR-GBDT(AUC=0.926)>RF(AUC=0.901)>GBDT(AUC=0.856).③ Selecting negative landslide samples in low susceptibility areas could significantly improve the accuracy of LSMs.[Conclusion]The research results can provide a reference for selecting negative landslide samples and constructing evaluation models,as well as for provi-ding theoretical support for post-earthquake disaster prevention and mitigation.

关键词

随机森林(RF)/梯度提升决策树(GBDT)/机器学习/频率比法(FR)/采样策略/同震滑坡/滑坡易发性区划

Key words

random forest(RF)/gradient boosting decision tree(GBDT)/machine learning/frequency ratio(FR)/sampling strategy/coseismic landslide/landslide susceptibility mapping

引用本文复制引用

基金项目

国家自然科学基金重大项目(42090054)

湖北省创新群体项目(2022CFA002)

出版年

2024
地质科技通报
中国地质大学(武汉)

地质科技通报

CSTPCD北大核心
影响因子:1.018
ISSN:2096-8523
被引量1
参考文献量17
段落导航相关论文