首页|基于ECFP指纹和决策树的重要分子片段识别研究

基于ECFP指纹和决策树的重要分子片段识别研究

扫码查看
基于片段的药物设计是一种新兴药物研发技术.如何实现分子片段的识别和定量表征是该技术的核心关键之一.提出了基于分子指纹和决策树的重要分子片段识别策略,利用扩展连通指纹对蛋白质-配体复合物进行分子片段编码,采用Random Forest、XGBoost和LightGBM三种决策树模型分别对特征重要性进行定量表征,以提取出具有较高可信度的重要分子片段.ECFP指纹的特征重要性呈现指数下降趋势,表明只有少数ECFP指纹特征对蛋白质-配体小分子的结合亲和力有显著的贡献度.三种决策树模型一致认可且具有高贡献度的分子片段,使其成为较高可信度的标志物,可应用于基于片段的药物设计和优化.
Identification of crucial molecular fragments through ECFP fingerprints and decision trees
Fragment-based drug design is an emerging technique in pharmaceutical research.One of the key challenges in this approach is the identification and quantitative characterization of molecular fragments.A strategy based on molecular fingerprints and decision trees has been proposed for the identification of important molecular fragments.This strategy utilizes Extended-Connectivity Fingerprints(ECFP)to encode molecular fragments of protein-ligand complexes.Three decision tree models—Ran-dom Forest,XGBoost,and LightGBM—are employed to quantify feature importance,enabling the extraction of highly reliable and important molecular fragments.The feature importance of ECFP fingerprints follows an exponential decay trend,indicating that only a few ECFP features significantly contribute to the binding affinity of protein-ligand complexes.Molecular fragments that are consis-tently recognized and highly contributive across all three decision tree models can be considered as highly reliable markers.These markers can be applied in fragment-based drug design and optimization.

protein-ligandcrucial molecular fragmentsmolecular fingerprintsdecision treesbinding affinity

李柏易、许振军、尹祚德、谢良旭、许晓军

展开 >

江苏理工学院生物信息与医药工程研究所,常州 213001

浙江古越龙山电子科技发展有限公司,绍兴 312000

蛋白质-配体 重要分子片段 分子指纹 决策树 结合亲和力

2024

现代计算机
中大控股

现代计算机

影响因子:0.292
ISSN:1007-1423
年,卷(期):2024.30(16)