河北建筑工程学院学报2024,Vol.42Issue(1) :211-215.DOI:10.3969/j.issn.1008-4185.2024.01.034

基于transformer的线条图图像检索

Transformer-based Line Drawing Image Retrieval

岳杰 彭炳鑫
河北建筑工程学院学报2024,Vol.42Issue(1) :211-215.DOI:10.3969/j.issn.1008-4185.2024.01.034

基于transformer的线条图图像检索

Transformer-based Line Drawing Image Retrieval

岳杰 1彭炳鑫1
扫码查看

作者信息

  • 1. 河北建筑工程学院,河北张家口 075000
  • 折叠

摘要

图像检索在计算机视觉中至关重要,在许多领域有着广泛的应用.但是在专利中,图片通常以线条图形式存在.由于线条图没有色彩和纹理信息,对线条图进行检索,仍面临巨大挑战.基于Transformer的线条图检索模型,充分利用Transformer长距离依赖建模的优点,有效的提取线条图全局特征.该模型将输入的线条图切分为n个Patch块,在Patch间通过自注意力机制提取特征,通过对特征进行处理得到100维的增强特征,最终根据图像特征的余弦相似度进行检索.通过实验表明与基于卷积神经网络的GoogleNet和ResNet50相比,基于transformer的模型能达到更好的效果.

Abstract

Image retrieval is crucial in computer vision and has widespread applications in various fields.However,in patents,images are typically presented in the form of line drawings.Since line draw-ings lack color and texture information,retrieving them still poses significant challenges.This work proposes a Transformer-based line drawing retrieval model that fully leverages the advan-tage of Transformer's long-range dependency modeling to effectively extract global features from line drawings.The model divides the input line drawing into n patches and extracts features among patches through a self-attention mechanism.These features are then processed to obtain 100-di-mensional enhanced features,and finally,retrieval is performed based on the cosine similarity of image features.Experimental results demonstrate that compared to GoogleNet and ResNet50,which are based on convolutional neural networks,the Transformer-based model achieves better performance.

关键词

Transformer/图像检索/线条图/计算机视觉

Key words

Transformer/Image Retrieval/Line Drawing/Computer vision

引用本文复制引用

出版年

2024
河北建筑工程学院学报
河北建筑工程学院

河北建筑工程学院学报

影响因子:0.502
ISSN:1008-4185
参考文献量9
段落导航相关论文