生态毒理学报2024,Vol.19Issue(4) :43-52.DOI:10.7524/AJE.1673-5897.20240322001

基于机器学习算法的化学品快速生物降解性筛查模型

Machine Learning Models on Screening Ready Biodegradability of Chemicals

徐嘉茜 王浩博 肖子君 刘文佳 何家乐 陈景文
生态毒理学报2024,Vol.19Issue(4) :43-52.DOI:10.7524/AJE.1673-5897.20240322001

基于机器学习算法的化学品快速生物降解性筛查模型

Machine Learning Models on Screening Ready Biodegradability of Chemicals

徐嘉茜 1王浩博 1肖子君 1刘文佳 1何家乐 1陈景文1
扫码查看

作者信息

  • 1. 工业生态与环境工程教育部重点实验室,大连市化学品风险防控及污染防治技术重点实验室,大连理工大学环境学院,大连 116024
  • 折叠

摘要

判别化学品能否被快速生物降解,有助于化学品的环境风险评估.以往化学品快速生物降解性(RB)的筛查模型,训练集所覆盖的化学空间小,模型预测准确性低,缺乏有效的应用域表征.本研究搜集5 606种化学品的RB数据,构建了机器学习筛查模型.结果表明,基于极端梯度提升树和Mordred分子描述符构建的模型性能最优,在外部验证集上的预测准确率为0.86,受试者工作特征曲线下面积为0.92.通过加权分子相似性密度和加权崎岖性2个指标,有效表征了模型应用域.通过模型的机理分析,发现羧基或羟基可显著提高化学物质的RB.对《中国现有化学物质名录》筛查结果表明,超过60%的化学物质难以快速生物降解,其中苯及其衍生物占比最高.所构建的RB筛查模型及其应用域,可为化学品的环境管理提供技术支持.

Abstract

Determining whether chemicals are readily biodegradable contributes to their environmental risk assess-ment.Previous models on screening ready biodegradability(RB)of chemicals have been limited by their narrow chemical space covered by the training sets,leading to low prediction accuracies.Previous models are also lack of effective application domain(AD)characterization.To address these challenges,this study collected RB data for 5 606 chemicals,and developed screening models on RB of chemicals using machine learning algorithms.A model developed with the Extreme Gradient Boosting algorithm and Mordred molecular descriptors exhibited optimal per-formance,achieving an accuracy rate of 0.86 and the area under the receiver operating characteristic curve of 0.92 on the external validation sets.The AD of the model was characterized by weighted molecular similarity density and weighted inconsistency in molecular activities.Mechanistic analysis of the model revealed that carboxyl and hydroxyl groups significantly enhance RB of chemicals.Screening of the Inventory of Existing Chemical Substances in China showed that over 60%of the chemical substances were not readily biodegradable.Among these chemicals,benzene and its derivatives constituted the largest proportion.The RB screening model and its AD characterization can aid in environmental management of chemicals.

关键词

化学品/快速生物降解/机器学习/应用域

Key words

chemicals/ready biodegradability/machine learning/applicability domain

引用本文复制引用

基金项目

国家重点研发计划(2022YFC3902100)

国家自然科学基金(22136001)

出版年

2024
生态毒理学报
中国科学院生态环境研究中心

生态毒理学报

CSTPCD北大核心
影响因子:0.857
ISSN:1673-5897
段落导航相关论文