首页|基于变量筛选与机器学习算法的渭-库绿洲土壤有机质含量估测研究

基于变量筛选与机器学习算法的渭-库绿洲土壤有机质含量估测研究

扫码查看
选择合适的变量筛选方法和模型可有效提升土壤有机质含量的估测精度.本研究以新疆渭干河-库车河绿洲为研究区,基于哨兵2号(Sentinel-2)卫星影像和实测土壤有机质,通过对土壤有机质与遥感影像波段及多种光谱指数进行相关分析,结合Boruta算法和连续投影算法(Successive Projections Algorithm,SPA)进行变量筛选,构建随机森林(Random Forest,RF)模型和BP神经网络(Back Propagation Neural Network,BPNN)模型进行表层土壤有机质含量的估测.结果表明:(1)波段B3、B4、B5、B7和B8A以及转换植被指数(Transformed Vegetation Index,TVI)、颜色指数(Color Index,CI)对土壤有机质含量的估测具有重要作用;(2)单独使用Boruta算法和SPA算法筛选的变量集建模效果要优于全变量集以及结合算法筛选的变量集,Boruta算法优于SPA算法;(3)RF模型的估测能力优于BPNN模型,最优估测模型训练集和验证集的决定系数(R2)均大于0.74,模型拟合效果较好,均方根误差(RMSE)小于2.0g/kg,相对分析误差(RPD)大于1.6,能够较好地进行土壤有机质含量的估测.采用Boruta算法结合随机森林模型可较好地反演绿洲表层土壤有机质的空间分布,为该区域土壤养分评价提供参考.
Estimation of Soil Organic Matter Content in Wei-Ku Oasis Based on Variables Screening and Machine Learning Algorithms
Appropriate variable screening methods and models can effectively improve the accuracy of soil organic matter content pre-diction.This study takes the Weigan-Kuche River oasis in Xinjiang as the research area.Based on Sentinel-2 satellite images and measured soil organic matter,correlation analysis was conducted between soil organic matter and remote sensing image bands,as well as multiple spectral indices.Variable screening was performed using the Boruta algorithm and the Continuous Projections Algorithm(SPA).The Random Forest(RF)model and the Back Propagation Neural Network(BPNN)model were constructed to estimate the organic matter content of the topsoil.The results indicate that:(1)Bands of B3,B4,B5,B7,and B8A,as well as the Transformed Vegetation Index(TVI)and Color Index(CI),play an important role in estimating soil organic matter content.(2)The modeling effect of variable sets filtered by the Boruta algorithm and SPA algorithm alone is better than that of variable sets filtered by full variable sets and the combined algorithm,and the Boruta algorithm is better than the SPA algorithm.(3)The prediction ability of the RF mod-el is better than the BPNN model.The determination coefficient(R2)of both the training and validation sets of the optimal estimation model are greater than 0.74,and the model fits well with root mean square error(RMSE)less than 2.0 g/kg and relative percent devi-ation(RPD)greater than 1.6,indicating that the random forest model can effectively predict the content of soil organic matter.Using the Boruta algorithm combined with the random forest model can better retrieve the spatial distribution of soil organic matter in the sur-face soil of the oasis and provide a reference for soil nutrient evaluation in this region.

boruta algorithmsuccessive projections algorithmrandom forestback propagation neural networksoil organic matter

李顿、王雪梅、李坤玉、安柏耸

展开 >

新疆师范大学地理科学与旅游学院,乌鲁木齐 830054

新疆维吾尔自治区重点实验室"新疆干旱区湖泊环境与资源实验室",乌鲁木齐 830054

Boruta算法 连续投影算法 随机森林 BP神经网络 土壤有机质

国家自然科学基金项目新疆维吾尔自治区自然科学基金项目

415610512020D01A79

2024

地球与环境
中国科学院地球化学研究所

地球与环境

CSTPCD北大核心
影响因子:0.875
ISSN:1672-9250
年,卷(期):2024.52(3)