Multimodal fusion and knowledge distillation for improved anomaly detection

扫码查看

原文链接

NETL
NSTL
Springer Nature

外文摘要：Abstract Anomaly detection aims to distinguish normal from abnormal images, with applications in industrial defect detection and medical imaging. Current methods using textual information often focus on designing effective textual prompts but overlook their full utilization. This paper proposes a multimodal fusion network that integrates image and text information to improve anomaly detection. The network comprises an image encoder, text encoder, and stacked cross-attention module. To address the absence of text during inference, an image-only branch is introduced, guided by the multimodal fusion network through knowledge distillation. Experiments on industrial anomaly detection and medical image datasets demonstrate the effectiveness of our approach, achieving AUROC and AUPR scores of 96.5% and 89.2% on VisA, respectively. The code is available at https://github.com/lilianoa/Multimodal-guide-AD.

作者：

Meichen Lu、Yi Chai、Kaixiong Xu、Weiqing Chen、Fei Ao、Wen Ji

展开 >

作者单位：

Chongqing University

Chongqing University Cancer Hospital

出版年：

2025

DOI：

10.1007/s00371-024-03723-6

The visual computer

ISSN：0178-2789

年,卷(期)：2025.41(8)

参考文献量46