Predicting the water ecological criteria of copper using machine learning and multiple linear regression approaches
In this research,copper and representative aquatic organisms in China were investigated as a case study.Based on the theoretical framework of the biotic ligand model(BLM),the key environmental factors(hardness,pH and dissolved organic carbon)were screened by a gradient boosting decision tree algorithm,and multivariate coupled predictive models were established for predicting acute toxicities of different aquatic organisms.And then,the species sensitivity distribution(SSD)analysis was performed to predict the WQC of copper for protecting aquatic organisms,which was suitable for the characteristics of water environment in China.It was found that the prediction accuracy(RFx,2.0)of a three-variable model based on aquatic toxicity data of 3phylum and 5families was 42%higher than that of the BLM.The SSD curves for the nine organisms were best fitted using a sigmoidal-logistic model(0.922<R2<0.991,0.0267<RMSE<0.0767,P>0.05),and the threshold of short-term water ecological criteria of copper is recommended as 0.07350~15.38µg/L in the river basin of China.Based on the feature importance analysis from machine learning,the key role of DOC in the formulation of WQC for metals was quantitatively identified,and it also provided direct evidence for intensively treating multiple environmental factors.Compared with existing technologies including the BLM,our finding makes a beneficial attempt to develop an"in situ"WQC predictive model to meet water environment characteristics and management needs in China.It will reduce the costs for environmental monitoring and management,and enhance the regionalization and precision of water environment management.