首页|Multimodal fusion and knowledge distillation for improved anomaly detection

Multimodal fusion and knowledge distillation for improved anomaly detection

扫码查看
Abstract Anomaly detection aims to distinguish normal from abnormal images, with applications in industrial defect detection and medical imaging. Current methods using textual information often focus on designing effective textual prompts but overlook their full utilization. This paper proposes a multimodal fusion network that integrates image and text information to improve anomaly detection. The network comprises an image encoder, text encoder, and stacked cross-attention module. To address the absence of text during inference, an image-only branch is introduced, guided by the multimodal fusion network through knowledge distillation. Experiments on industrial anomaly detection and medical image datasets demonstrate the effectiveness of our approach, achieving AUROC and AUPR scores of 96.5% and 89.2% on VisA, respectively. The code is available at https://github.com/lilianoa/Multimodal-guide-AD.

Meichen Lu、Yi Chai、Kaixiong Xu、Weiqing Chen、Fei Ao、Wen Ji

展开 >

Chongqing University

Chongqing University Cancer Hospital

2025

The visual computer

The visual computer

ISSN:0178-2789
年,卷(期):2025.41(8)
  • 46