首页|Multimodal fusion and knowledge distillation for improved anomaly detection
Multimodal fusion and knowledge distillation for improved anomaly detection
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
Springer Nature
Abstract Anomaly detection aims to distinguish normal from abnormal images, with applications in industrial defect detection and medical imaging. Current methods using textual information often focus on designing effective textual prompts but overlook their full utilization. This paper proposes a multimodal fusion network that integrates image and text information to improve anomaly detection. The network comprises an image encoder, text encoder, and stacked cross-attention module. To address the absence of text during inference, an image-only branch is introduced, guided by the multimodal fusion network through knowledge distillation. Experiments on industrial anomaly detection and medical image datasets demonstrate the effectiveness of our approach, achieving AUROC and AUPR scores of 96.5% and 89.2% on VisA, respectively. The code is available at https://github.com/lilianoa/Multimodal-guide-AD.
Meichen Lu、Yi Chai、Kaixiong Xu、Weiqing Chen、Fei Ao、Wen Ji