摘要
根据基于预印摘要的新闻报道,我们的记者获得了来自BI orxiv.org的以下引文:“确定化合物的水溶性对于硅胶药物的发现非常重要。”本文探讨并评价了多种机器学习模型在化合物水溶性预测中的可预测性,具体地说,我们将一系列机器学习算法,包括随机森林算法、XGBoost算法、LightGBM算法和D CatBoost算法,应用于一个成熟的水溶性数据集(即,D-CatBoost)。实验结果表明,即使是传统的机器学习算法也能以较高的精度获得令人满意的性能。此外,我们的研究不仅仅局限于预测精度,深入研究模型的可解释性,以识别关键特征并了解影响预测结果的分子性质。
Abstract
By a News Reporter-Staff News Editor at Robotics & Machine Learning Daily News Daily News-According to news reporting based on a preprint abstract, our journalists obtained the following quote sourced from bi orxiv.org: "Determining the aqueous solubility of the chemical compound is of great importa nce in-silico drug discovery. "However, correctly and rapidly predicting the aqueous solubility remains a chal lenging task. "This paper explores and evaluates the predictability of multiple machine learni ng models in the aqueous solubility of compounds. Specifically, we apply a serie s of machine learning algorithms, including Random Forest, XGBoost, LightGBM, an d CatBoost, on a well-established aqueous solubility dataset (i. e., the Huuskon en dataset) of over 1200 compounds. Experimental results show that even traditio nal machine learning algorithms can achieve satisfactory performance with high a ccuracy. "In addition, our investigation goes beyond mere prediction accuracy, delving in to the interpretability of models to identify key features and understand the mo lecular properties that influence the predicted outcomes.