Status of application of machine learning algorithm in drug toxicity prediction
Objective:To evaluate the status of the application of machine learning algorithms in drug toxicity prediction based on bibliometric analysis,thus to provide reference for relevant research and application and promote the interdisciplinary development of"medicine +information".Methods:Relevant literature published in databases such as CNKI and Wanfang were searched using keywords such as"toxicity prediction","QSAR","computational toxicology"and"machine learning".The literature was then classified and organized based on the"type of algorithm used"and"toxicity prediction stage to which it was applied".A review was conducted on the status of the application of machine learning algorithms in drug toxicity prediction.Results:A total of 122 relevant and valid articles were retrieved.Machine learning has been used for toxicity data set processing,drug information characterization screening,and prediction model training in drug toxicity prediction.Among them,there were more types and numbers of algorithms used for model training tasks.Various algorithms have been studied and applied in the field of drug toxicity prediction,among which support vector machine algorithm,random forest algorithm,and deep learning algorithm were the most commonly used.Furthermore,most literature indicated that models based on deep learning or ensemble learning had higher predictive performance.Conclusion:Machine learning algorithms have been widely used in the field of toxicity prediction.The main considerations when selecting an algorithm include the size of the dataset,algorithm processing speed,adaptability to outlier data,redundant data,and noisy data,as well as the implementation difficulty of the algorithm.Computer-aided toxicity prediction has many advantages over traditional in vivo and in vitro experiments.However,there are still some challenges that need to be addressed,including several gaps in the medical data-related fields,the urgent need to improve the quality of existing data,and the selection of drug information characterization methods.