集成技术2025,Vol.14Issue(1) :78-90.DOI:10.12146/j.issn.2095-3135.20240422001

基于文本增强的眼底图像多病种识别方法

Multi-disease Recognition Method for Fundus Images Based on Text Enhancement

熊绍奎 陈世峰
集成技术2025,Vol.14Issue(1) :78-90.DOI:10.12146/j.issn.2095-3135.20240422001

基于文本增强的眼底图像多病种识别方法

Multi-disease Recognition Method for Fundus Images Based on Text Enhancement

熊绍奎 1陈世峰2
扫码查看

作者信息

  • 1. 南方科技大学 深圳 518055
  • 2. 中国科学院深圳先进技术研究院 深圳 518055
  • 折叠

摘要

该研究在眼科图像疾病识别中引入了视觉语言模型,提出了一种基于对比语言图像预训练模型的多疾病识别算法.首先,作者基于多个公开可用的眼底图像数据集构建了一个含有 8 个类别的多标签眼底图像数据集 MDFCD8;其次,作者利用生成式人工智能 GPT-4(Generative Pre-trained Transformer 4)生成描述眼底图像细粒度病理特征的专家知识,解决了眼底图像数据集文本标签缺乏的问题;最后,作者计算了平均精度、F1 评分和受试者工作特征曲线下面积,并以三者的均值作为最终的性能评价指标.实验结果表明,与传统的卷积神经网络和 Transformer 网络相比,作者提出的方法在性能上分别高出 4.8%和 3.2%.同时,作者还进行了各模块的消融实验,验证了该方法的有效性,表明了视觉语言模型在眼科疾病辅助诊断领域的应用潜力.

Abstract

In this work,a visual language model is introduced in ophthalmic image disease recognition.And a multi-disease recognition algorithm based on a pre-trained contrasting language-images model is proposed.First,a multi-labeled fundus image dataset MDFCD8 containing 8 categories is constructed based on several publicly available fundus image datasets.Then,the generative artificial intelligence GPT-4(Generative Pre-trained Transformer 4)is utilized to generate expert knowledge describing the fine-grained pathological features of fundus images,which solves the problem of the lack of text labels in fundus image datasets.The paper calculates the average precision(AP),F1 score,and area under the receiver operating characteristic curve(AUC),and takes the mean value of the three as the final performance evaluation index.The experimental results showed that,the method proposed in this paper outperforms the traditional convolutional neural network and Transformer network by 4.8%and 3.2%,respectively.This study also conducted ablation experiments on each module to validate the effectiveness of the method,demonstrating the potential application of visual language modeling in the field of auxiliary diagnosis of ophthalmic diseases.

关键词

眼底图像/多病种/对比语言图像预训练/专家知识

Key words

fundus images/multi-disease/constrastive language-image pretraining/expert knowledge

引用本文复制引用

出版年

2025
集成技术
中国科学院深圳先进技术研究院

集成技术

影响因子:0.238
ISSN:2095-3135
段落导航相关论文