Fine-Grained Image Classification Based on Multi-Modal Features and Enhanced Alignment
Addressing the limitations of existing models in multimodal information processing,such as inadequate feature extraction and insufficient information interaction,a fine-grained image classification model is proposed,incorporating multi-modal features and enhanced alignment.A hierarchical feature adaptive fusion module is proposed to achieve multi-level adaptive fusion of multi-modal features,fully utilizing feature information of the convolutional intermediate layer and enhancing the model' s ability to perceive local details of the image.Additionally,an enhanced aligned feature fusion module is proposed to improve the interaction dimension between multimodal features and make full use of the mapping relationship between different modalities.Experimental results show that the proposed model achieves excellent recognition performance on several public datasets,outperforming previous multimodal feature fusion models.Furthermore,through comparative analysis in ablation experiments,the results of individual modules are better than the original model,highlighting the effectiveness of the proposed model.
deep learningfine-grained image classificationmultimodaladaptive feature fusionattention mechanism