Estimation of Soil Organic Matter Content in Wei-Ku Oasis Based on Variables Screening and Machine Learning Algorithms
Appropriate variable screening methods and models can effectively improve the accuracy of soil organic matter content pre-diction.This study takes the Weigan-Kuche River oasis in Xinjiang as the research area.Based on Sentinel-2 satellite images and measured soil organic matter,correlation analysis was conducted between soil organic matter and remote sensing image bands,as well as multiple spectral indices.Variable screening was performed using the Boruta algorithm and the Continuous Projections Algorithm(SPA).The Random Forest(RF)model and the Back Propagation Neural Network(BPNN)model were constructed to estimate the organic matter content of the topsoil.The results indicate that:(1)Bands of B3,B4,B5,B7,and B8A,as well as the Transformed Vegetation Index(TVI)and Color Index(CI),play an important role in estimating soil organic matter content.(2)The modeling effect of variable sets filtered by the Boruta algorithm and SPA algorithm alone is better than that of variable sets filtered by full variable sets and the combined algorithm,and the Boruta algorithm is better than the SPA algorithm.(3)The prediction ability of the RF mod-el is better than the BPNN model.The determination coefficient(R2)of both the training and validation sets of the optimal estimation model are greater than 0.74,and the model fits well with root mean square error(RMSE)less than 2.0 g/kg and relative percent devi-ation(RPD)greater than 1.6,indicating that the random forest model can effectively predict the content of soil organic matter.Using the Boruta algorithm combined with the random forest model can better retrieve the spatial distribution of soil organic matter in the sur-face soil of the oasis and provide a reference for soil nutrient evaluation in this region.