Text-Convolutional Neural Network-based Discovery of Antibacterial Agents
OBJECTIVE To build a text-convolutional neural network(Text-CNN)-based prediction model for anti-Staphylococcus aureus(S.aureus)activity and identify anti-S.aureus hits by virtual screening.METHODS A dataset containing 26327 compounds annotated with S.aureus activity data was collected and curated from the ChEMBL database.Ten pairs of training and test sets were generated by random partition for 10 times and then 10 models were built using the Text-CNN algorithm.The best-performing model was determined by model evaluation and further studied by Y-randomization test and applicability domain analysis.Following that,the best-performing model was used to virtually screen the in-house chemical library,by which the potential antibacterial agents were deter-mined.The micro-broth dilution method was used to test anti-S.aureus activity of the potential hits.RESULTS The machine-learning model(named Text-CNN3)performed well in classification.Evaluated on the test set,its Mathews correlation coefficient was 0.573 and the area under the ROC curve was 0.881.With this model for virtual screening as well as antibacterial screening,compounds Y5 and Y7 were identified as antibacterial compounds,with minimum inhibitory concentrations(MIC)of 8 and4pg·mL-1,respectively.CONCLUSION The Text-CNN3 model in this study is effective to identify anti-S.aureus compounds,while the antibacterial hits Y5 and Y7 are worthy of further study.