Deep learning models for the classification of Mayo endoscopic score of ulcerative colitis
徐昶 1林嘉希 1王玉 2陆建英 1刘晓琳 1许春芳 1朱锦舟 1古敏怡
扫码查看
点击上方二维码区域,可以放大扫码查看
作者信息
1. 苏州大学附属第一医院消化内科,苏州 215006
2. 江苏大学附属金坛医院普外科,常州 213200
折叠
摘要
目的 利用深度卷积神经网络算法,构建溃疡性结肠炎(UC)Mayo内镜评分模型,并评估模型效能。 方法 收集苏州大学附属第一医院消化内镜中心及HyperKvasir数据库的内镜图片共2400张作为训练集和验证集;收集江苏大学附属金坛医院消化内镜中心内镜图片200张作为测试集。内镜图片根据Mayo内镜评分系统进行评分(0~3分)。选取在ImageNet数据集预训练的4种深度卷积神经网络(MobileNetV2、ResNetV2、Xception及EfficientNetV2S),利用迁移学习建立UC四分类模型,并在测试集中基于混淆矩阵,使用准确率、马修相关系数(MCC)、卡帕系数评价模型的分类能力,与高、低年资医师的表现比较。采用梯度加权分类激活映射算法可视化呈现模型分类过程。 结果 针对UC内镜图片,成功构建4个基于深度学习的Mayo评分模型。MobileNetV2、ResNetV2、Xception及EfficientNetV2S在测试集中的分类准确性分别达0.785、0.800、0.815、0.830,平均分类准确性为0.808。其中,EfficientNetV2S模型表现最好,优于低年资医师(0.785),略低于高年资医师(0.870)。 结论 基于深度学习算法构建的UC内镜图片评分模型分类能力较高,可扩大样本量、优化模型框架进一步提升模型分类效能。 Objective To develop deep learning models for ulcerative colitis (UC) classification based on Mayo endoscopic score. Methods A total of 2400 endoscopic images from the Gastrointestinal Endoscopy Centre of the First Affiliated Hospital of Soochow University and the HyperKvasir database were extracted for training classification models, and 200 endoscopic images from Affiliated Jintan Hospital of Jiangsu University were extracted for evaluating the models, both scored by endoscopists according to Mayo endoscopic score (score 0-3). Four deep convolutional neural networks (MobileNetV2, ResNetV2, Xception, EfficientNetV2S), which were pre-trained in the ImageNet database, were used to develop the UC classification models by transfer learning. Models were evaluated in the test set based on the confusion matrix using accuracy, Matthews correlation coefficient (MCC) and Cohen′s kappa, and compared with the performance of senior and junior physicians. Meanwhile, the model was visualized by gradient-weighted class activation mapping. Results Four deep learning Mayo score models based on UC endoscopic image classification models were successfully developed. In the test set, the accuracy of MobileNetV2, ResNetV2, Xception and EfficientNetV2S was 0.785, 0.800, 0.815, 0.830, respectively (average accuracy 0.808). Amoug them, EfficientNetV2S model was the best, higher than junior physician′s accuracy (accuracy 0.785), and slightly lower than senior physician′s (accuracy 0.870) . Conclusions The UC endoscopic severity classification models developed by deep learning show good performance, which can be further improved by larger sample size and optimizing the framework.
Abstract
Objective To develop deep learning models for ulcerative colitis (UC) classification based on Mayo endoscopic score. Methods A total of 2400 endoscopic images from the Gastrointestinal Endoscopy Centre of the First Affiliated Hospital of Soochow University and the HyperKvasir database were extracted for training classification models, and 200 endoscopic images from Affiliated Jintan Hospital of Jiangsu University were extracted for evaluating the models, both scored by endoscopists according to Mayo endoscopic score (score 0-3). Four deep convolutional neural networks (MobileNetV2, ResNetV2, Xception, EfficientNetV2S), which were pre-trained in the ImageNet database, were used to develop the UC classification models by transfer learning. Models were evaluated in the test set based on the confusion matrix using accuracy, Matthews correlation coefficient (MCC) and Cohen′s kappa, and compared with the performance of senior and junior physicians. Meanwhile, the model was visualized by gradient-weighted class activation mapping. Results Four deep learning Mayo score models based on UC endoscopic image classification models were successfully developed. In the test set, the accuracy of MobileNetV2, ResNetV2, Xception and EfficientNetV2S was 0.785, 0.800, 0.815, 0.830, respectively (average accuracy 0.808). Amoug them, EfficientNetV2S model was the best, higher than junior physician′s accuracy (accuracy 0.785), and slightly lower than senior physician′s (accuracy 0.870) . Conclusions The UC endoscopic severity classification models developed by deep learning show good performance, which can be further improved by larger sample size and optimizing the framework.