首页|基于机器学习算法的化学品快速生物降解性筛查模型

基于机器学习算法的化学品快速生物降解性筛查模型

扫码查看
判别化学品能否被快速生物降解,有助于化学品的环境风险评估.以往化学品快速生物降解性(RB)的筛查模型,训练集所覆盖的化学空间小,模型预测准确性低,缺乏有效的应用域表征.本研究搜集5 606种化学品的RB数据,构建了机器学习筛查模型.结果表明,基于极端梯度提升树和Mordred分子描述符构建的模型性能最优,在外部验证集上的预测准确率为0.86,受试者工作特征曲线下面积为0.92.通过加权分子相似性密度和加权崎岖性2个指标,有效表征了模型应用域.通过模型的机理分析,发现羧基或羟基可显著提高化学物质的RB.对《中国现有化学物质名录》筛查结果表明,超过60%的化学物质难以快速生物降解,其中苯及其衍生物占比最高.所构建的RB筛查模型及其应用域,可为化学品的环境管理提供技术支持.
Machine Learning Models on Screening Ready Biodegradability of Chemicals
Determining whether chemicals are readily biodegradable contributes to their environmental risk assess-ment.Previous models on screening ready biodegradability(RB)of chemicals have been limited by their narrow chemical space covered by the training sets,leading to low prediction accuracies.Previous models are also lack of effective application domain(AD)characterization.To address these challenges,this study collected RB data for 5 606 chemicals,and developed screening models on RB of chemicals using machine learning algorithms.A model developed with the Extreme Gradient Boosting algorithm and Mordred molecular descriptors exhibited optimal per-formance,achieving an accuracy rate of 0.86 and the area under the receiver operating characteristic curve of 0.92 on the external validation sets.The AD of the model was characterized by weighted molecular similarity density and weighted inconsistency in molecular activities.Mechanistic analysis of the model revealed that carboxyl and hydroxyl groups significantly enhance RB of chemicals.Screening of the Inventory of Existing Chemical Substances in China showed that over 60%of the chemical substances were not readily biodegradable.Among these chemicals,benzene and its derivatives constituted the largest proportion.The RB screening model and its AD characterization can aid in environmental management of chemicals.

chemicalsready biodegradabilitymachine learningapplicability domain

徐嘉茜、王浩博、肖子君、刘文佳、何家乐、陈景文

展开 >

工业生态与环境工程教育部重点实验室,大连市化学品风险防控及污染防治技术重点实验室,大连理工大学环境学院,大连 116024

化学品 快速生物降解 机器学习 应用域

国家重点研发计划国家自然科学基金

2022YFC390210022136001

2024

生态毒理学报
中国科学院生态环境研究中心

生态毒理学报

CSTPCD北大核心
影响因子:0.857
ISSN:1673-5897
年,卷(期):2024.19(4)