Machine Learning Models on Screening Ready Biodegradability of Chemicals
Determining whether chemicals are readily biodegradable contributes to their environmental risk assess-ment.Previous models on screening ready biodegradability(RB)of chemicals have been limited by their narrow chemical space covered by the training sets,leading to low prediction accuracies.Previous models are also lack of effective application domain(AD)characterization.To address these challenges,this study collected RB data for 5 606 chemicals,and developed screening models on RB of chemicals using machine learning algorithms.A model developed with the Extreme Gradient Boosting algorithm and Mordred molecular descriptors exhibited optimal per-formance,achieving an accuracy rate of 0.86 and the area under the receiver operating characteristic curve of 0.92 on the external validation sets.The AD of the model was characterized by weighted molecular similarity density and weighted inconsistency in molecular activities.Mechanistic analysis of the model revealed that carboxyl and hydroxyl groups significantly enhance RB of chemicals.Screening of the Inventory of Existing Chemical Substances in China showed that over 60%of the chemical substances were not readily biodegradable.Among these chemicals,benzene and its derivatives constituted the largest proportion.The RB screening model and its AD characterization can aid in environmental management of chemicals.