Prediction of PPCPs Solid-Liquid Partition Coefficient Based on Machine Learning
In recent years,increasing significance has been attached to pharmaceuticals and personal care products(PPCPs).Studying the solid-liquid partition coefficient(Kd)of PPCPs in solid environmental media is crucial for understanding their fate and assessing their environmental risks.However,traditional methods based on linear parti-tioning have limitations in terms of stability and accuracy.This study collected adsorption batch experimental data for 24 common PPCPs,including Kd,soil properties,experimental parameters,and compound molecular descriptors to construct a dataset,and employed machine learning to build a predictive model for Kd.The results indicated that the predictive performance of both Random Forest(RF)and Extreme Gradient Boosting(XGBoost)regression models was similar and superior to that of Support Vector Regression(SVR),Furthermore,as SHAP analysis revealed,the octanol-water partition coefficient(logKOW),molar refractivity(MR),molecular weight(MW),solid-liquid ratio(RATIO),and organic carbon content(OC)had the most significant impact on Kd.Application domain analysis and model validation using reported data on 12 PPCPs and 42 sediment samples from streams and rivers in Guangzhou City showed that,except for erythromycin and roxithromycin,the models constructed in this study could accurately predict the Kd values for the remaining PPCPs.Additionally,our research found that for com-pounds such as ciprofloxacin,ofloxacin and sulfamethazine,whose solubility significantly increases under weakly acidic and weakly alkaline conditions,the method developed in this study may underestimate the actual Kd values in weakly acidic and weakly alkaline environments.