基于Vision Transformer的智能图像处理研究

Research on Intelligent Image Processing Based on Vision Transformer

刘红娇¹

扫码查看

作者信息

1. 吉林师范大学博达学院数学学院,吉林四平 136000
折叠

摘要

传统的图像处理模型依赖于手工设计的特征提取器,在处理全局上下文信息时存在困难,导致模型在理解图像整体语义时受限.因此,提出了一种基于视觉自注意力模型(ViT)的智能图像处理,并对其进行改进,通过引入多头自注意力机制和层级特征提取模块,提高模型的处理能力.结果表明,所提模型在训练集数量为1 200左右时的性能趋于稳定,且表现出较好的性能.其他算法在训练集数量为1 200时未处于最佳性能.当训练集达到2 000时,所提模型的结构相似度值为0.98.结果表明,所提模型在处理图像时表现出了较高的性能和处理效率,为图像处理领域的问题带来了新的解决方法.

Abstract

Traditional image processing models rely on manually designed feature extractors,which may not capture high-level semantic information in the image and pose difficulties in processing global contextual information,leading to limitations in understanding overall semantics of the image.Therefore,a smart image processing based on vision transformer(ViT)was proposed and improved by introducing a multiple headed self attention mechanism and a hierarchical feature extraction module to enhance the processing and generalization capabilities of the model.The results show that the proposed model tends to stabilize and exhibits good performance when the number of training sets is around 1 200.When the number of training sets for other algorithms is 1 200,their model performance still fluctuates and is not at its optimal performance.When the training set reaches 2 000,the structural similarity value of the proposed model is 0.98.The results indicate that the proposed model exhibits high performance and processing efficiency in image processing,bringing new solutions to problems in the field of image processing.

关键词

视觉自注意力模型/图像处理/多头自注意力/人工智能

Key words

ViT/image processing/multiple headed self attention/AI

引用本文复制引用

出版年

2024

自动化应用

重庆西南信息有限公司

自动化应用

影响因子：0.156

ISSN：1674-778X

段落导航