目的 根据注意力机制,创新设计出合适的分类头,应对传统多标签分类方法对长尾尾部标签分类效果较差的问题。方法 本文提出了一种新的分类头结构(双路注意力分类头,Dual-Attention Head,DAH),一路使用嵌入维度的注意力机制,一路使用子图维度的注意力机制,进而整合特征层面和样本层面的交互信息,提高模型进行多标签分类时对尾部标签的分类效果以及全类平均精度(mean Average Precision,mAP)。本文还对比不同图像预处理方法,进而分析图像分辨率及空间关系对多标签分类结果的影响,对比不同结构的图像编码器,分析更适合多标签分类的图像编码器。结果 通过引入DAH作为分类头,相比使用多标签解码器(ML-Decoder),mAP从0。2698提高到0。2904,其中对尾部标签的平均精度(Average Precision,AP)从0。1511提高到0。2306。结论 DAH结合嵌入维度和子图维度的注意力机制,使得模型进一步平衡长尾分布的多标签分类任务中尾部标签的分类效果,相比ML-Decoder,进一步改善了长尾分布中的不平衡问题,其中使用的嵌入维度的注意力机制为分类头中注意力的引入方式提供了新的思路。此外,本研究发现直接输入高分辨率的胸部X光片相比减小图片分辨率和分割子图两种方法更加适配于使用DAH进行多标签分类的场景。
Dual-attention classification head for long-tailed and multi-label classification on chest X-rays
Objective Diagnosing chest X-rays(CXR)is a multi-label problem,as patients often exhibit multiple diseases simultaneously.However,most diseases are relatively rare,leading to a long-tailed distribution in clinical presentations,which poses challenges for standard deep learning methods.Traditional image classification models tend to focus on distinguishing the most common classes at the expense of important but rare"tail"classes,making them unsuitable for the class imbalance and co-occurrence issues in long-tailed,multi-label tasks.This study aims to design an innovative classification head based on attention mechanisms to address the poor classification performance of traditional multi-label classification methods on tail labels.Methods This paper proposes a new classification head structure,the Dual-Attention Head(DAH),which utilizes attention mechanisms across both embedding dimensions and sub-image dimensions.This dual-path approach integrates feature-level and instance-level interactions,improving the model's performance on tail labels in multi-label classification and enhancing the mean Average Precision(mAP)across all classes.We also compare different image preprocessing methods to analyze the impact of image resolution and spatial relationships on multi-label classification outcomes,and we evaluate various image encoder structures to identify the most suitable ones for multi-label classification.Results By introducing DAH as the classification head,the mAP improved from 0.2698 to 0.2904 compared to using ML-Decoder,with the average precision(AP)on tail labels increasing from 0.1511 to 0.2306.Conclusion The DAH,which combines embedding-level and sub-image-level attention mechanisms,further balances classification performance for tail labels in long-tailed,multi-label tasks.Compared to the ML-Decoder,it effectively addresses the imbalance issues in long-tail distributions.The embedding-level attention mechanism used in DAH also offers a new approach for incorporating attention into classification heads.Additionally,this study finds that directly inputting high-resolution CXR images is more suited to DAH for multi-label classification than downsampling or segmenting images into sub-images.
Deep learningMulti-label classificationLong-tailed distributionAttention MechanismClassification Head