Prediction of heavy metal toxicity and ecological risk based on machine learning methods
This study focused on the toxicity of typical heavy metals in soil,including cadmium(Cd),copper(Cu),lead(Pb),and zinc(Zn),and summarized their effects on the model organisms,earthworms.A total of 113 datasets encompassing the median effective concentration(EC50)of heavy metals on earthworm reproduction,along with corresponding soil physicochemical properties,were compiled from the published literature.The correlation between various datasets was analyzed to reveal the influence of soil physicochemical factors on the biotoxicity of heavy metals.Five machine learning algorithms,including Random Forest(RF),Gradient Boosting Decision Tree(GBDT),Extreme Gradient Boosting(XGBoost),K-Nearest Neighbor(KNN),and Support Vector Regression(SVR),were employed to develop predictive models for biotoxicity of heavy metals based on soil characteristics,ultimately selecting the best-performing model for predicting potential ecological risk thresholds of heavy metals in Chinese soils.The results indicate significant variation in heavy metal toxicity across different soils,with the toxicity trend for earthworm reproduction ranking as follows:indicate significant variation in heavy metal toxicity across different soil types,with the toxicity ranking for earthworm reproduction as Cd>Cu>Pb≈Zn.The effects of soil physicochemical properties on heavy metal toxicity varies depending on the specific heavy metal.Specifically,soil pH emerged as a key factor influencing the toxicity of Pb and Cd,contributing 57.2%and 69.0%respectively,while cation exchange capacity and organic matter content were found to be the primary influencing factors for the bio-toxicity of Cu and Zn.The performance of the machine prediction models for biological toxicity of heavy metals based on soil physicochemical factors was compared and analyzed in terms of model fit and prediction accuracy.Among the predictive models,the XGBoost model performed well for predicting the bio-toxicity of Cd,Cu,and Zn,while the RF model demonstrated higher accuracy in predicting Pb bio-toxicity,achieving R2 values of 0.939 and 0.886 for training and testing sets,respectively.Furthermore,the potential ecological risk thresholds of heavy metals in soils across 34provinces in China were evaluated with the selected models,revealing significant regional differences in potential ecological risks.The findings provided a new strategy for accurate prediction and rational assessment of heavy metal ecological toxicity and potential ecological risk based on soil physicochemical properties.