Prediction of cadmium bioconcentration factor for peanuts based on machine-learning methods
In this study,100 pairs of soil and peanut samples were collected from 14 provinces in China.The soil-peanut cadmium(Cd)contamination characteristics and soil physicochemical properties were analyzed.Prediction models of the Cd bioconcentration in peanuts were established based on machine-learning methods and the important factors influencing Cd enrichment in peanuts were identified.The results showed that the soil samples collected were mainly acidic,with 60% of the soils being pH<6.5.The average Cd content in peanut kernels was 0.27 mg·kg-1 and the average bioconcentration factor was 2.42.The prediction performance was significantly better for the random forest models(R2=0.930-0.966),based on the data for the whole country,and the grouped northern and southern producing areas,than for the corresponding multiple linear regression models(R2=0.471-0.657).The results of random forest model analysis showed that the characteristic variables with high relative importance were different in different regions.The most important variables affecting the prediction of Cd bioconcentration in northern producing areas were the free manganese oxide content,free iron oxide content,and pH of the soil,while the most important variables affecting the Cd bioconcentration in southern producing areas were the free manganese oxide,clay,free iron oxide,and organic matter contents of the soil.The results revealed that,compared with the traditional multiple linear regression models,random forest models had better performance at predicting the Cd bioconcentration of peanuts.This provides a new perspective and solution for predicting Cd transfer in soil-peanut systems at a large scale in the field.