首页|基于双分支注意力机制的图像自动标注研究

基于双分支注意力机制的图像自动标注研究

扫码查看
图像自动标注技术能够将图像低层视觉特征转化为人类理解的高层语义信息,增强图像的可理解性和可搜索性,在图像检索和图像分类领域具有重要的应用价值。目前,基于卷积神经网络模型的图像自动标注技术,仍存在浅层网络无法捕捉足够的特征信息、容易忽视标签之间的相互关系以及标注时难以确定标签数量的问题。该文提出的基于双分支注意力机制的图像自动标注模型,首先使用双分支注意力网络,增强图像特征和标签的相关性以及学习标签之间的相关性;其次在空间注意力分支增加多尺度特征提取模块,以提取图像的多尺度特征,解决浅层网络特征提取不充分的问题;再次通过融合模块,融合两个分支的输出,将图像特征进一步增强;最后通过标签数量预测模块,预测待标注图像的标签数量,进一步提高标注的准确性。该模型分别在三个基准数据集Corel 5K、ESP Game和IAPR-TC-12 上进行实验分析,实验结果表明该模型可以有效解决上述问题,提高标注的有效性与准确性。
Research on Automatic Image Annotation Based on Dual-branch Attention Mechanism
Automatic image annotation technology can transform low-level visual features of images into high-level semantic information understood by humans,enhancing the comprehensibility and searchability of images,and has important application value in the fields of image retrieval and classification.At present,automatic image annotation technology based on convolutional neural network models still faces problems such as shallow networks being unable to capture sufficient feature information,easily ignoring the interrelationships between labels,and difficulty in determining the number of labels during annotation.The proposed automatic image annotation method based on dual-branch attention mechanism first uses a dual-branch attention network to enhance the correlation between image features and labels,as well as learn the correlation between labels.Secondly,a multi scale feature extraction module is added to the spatial attention branch to extract multi scale features of the image,solving the problem of insufficient feature extraction in shallow networks.By fusing the outputs of the two branches again through the fusion module,the image features are further enhanced.Finally,the label quantity prediction module is used to predict the number of labels in the image to be annotated,further improving the accuracy of annotation.The proposed model was experimentally analyzed on three benchmark datasets,Corel 5K,ESP Game,and IAPR-TC-12.The experimental results showed that the proposed method can effectively solve the above problems and improve the effectiveness and ac-curacy of labeling.

automatic image annotationconvolutional neural networkmulti scale featureattention mechanismfeature fusion

张国有、崔永强

展开 >

太原科技大学 计算机科学与技术学院,山西 太原 030024

图像自动标注 卷积神经网络 多尺度特征 注意力机制 特征融合

山西省自然科学基金项目国家自然科学基金项目太原科技大学科技创新基金项目

2022030212211456207232520212039

2024

计算机技术与发展
陕西省计算机学会

计算机技术与发展

CSTPCD
影响因子:0.621
ISSN:1673-629X
年,卷(期):2024.34(9)