基于循环结构的视觉Transformer

Visual Transformer based on a recurrent structure

扫码查看

原文链接

维普
万方数据

中文摘要：针对视觉Transformer(Vision Transformer,ViT)性能的提升依赖于网络的参数量,从而导致其应用场景受限的缺点,本文从神经学得到启发,创新性地提出将人脑神经元之间的循环结构应用在ViT上.文中首次从黎曼几何的角度解释了循环结构生效的工作原理,之后以Token-to-Token Transformer(T2T Transformer)为主干框架提出了基于循环结构的ViT.实验结果表明:循环结构的引入能在视觉Transformer参数量基本不变化的情况下大幅提高其性能,使用循环结构后,在Imagenet分类数据集下网络仅增加0.14%的参数,但带来9%的分类精度提升;在目标检测任务中,增加0.1%的参数带来10.7%的性能提升.

外文摘要：In recent years,Vision Transformer(ViT)has shown amazing potential in areas such as image classification,target detection,and image generation.However,the performance improvement of ViT relies on the number of parameters in the network,leading to limited application scenarios.Inspired by neurology,this paper innovatively proposes to apply the recurrent structure between neurons in the human brain to ViT.This paper explains for the first time how the recurrent structure works from the perspective of Riemannian geometry,then presents the recurrent Vision Transformer model based on Token-to-Token Transformer architecture.Experimental results show that the introduction of the recurrent structure can substantially improve the performance of ViT with essentially no change in the number of parameters:we apply the recurrent structure in the Transformer networks,and on the Imagenet classification dataset,the network increase the parameters by only 0.14%,but bring a 9%improvement in classification accuracy.In the target detection task,increasing the parameters by 0.1%brings a 10.7%performance improvement.

外文关键词：

vision transformerrecurrent structureRiemannian geometry

作者：

蒋磊、王子其、崔振宇、常志勇、时小虎

展开 >

作者单位：

莫斯科国立大学数学力学系,莫斯科 119991

莫斯科国立大学计算数学与控制理论系,莫斯科 119991

吉林大学生物与农业工程学院长春 130022

吉林大学工程仿生教育部重点实验室长春 130022

吉林大学计算机科学与技术学院,长春 130012

吉林大学符号计算与知识工程教育部重点实验室,长春 130012

展开 >

关键词：

视觉Transformer 循环结构黎曼几何

基金：

国家自然科学基金项目吉林省科技发展计划项目吉林省发改委项目

项目编号：

6227219220210201080GX2021C044-1

出版年：

2024

DOI：

10.13229/j.cnki.jdxbgxb.20221141

吉林大学学报(工学版)

吉林大学

吉林大学学报(工学版)

CSTPCD北大核心

影响因子：0.792

ISSN：1671-5497

年,卷(期)：2024.54(7)