Vision Transformers with Hierarchical Attention

扫码查看

原文链接

NETL
NSTL
万方数据

外文摘要：This paper tackles the high computational/space complexity associated with multi-head self-attention(MHSA)in vanilla vision transformers.To this end,we propose hierarchical MHSA(H-MHSA),a novel approach that computes self-attention in a hier-archical fashion.Specifically,we first divide the input image into patches as commonly done,and each patch is viewed as a token.Then,the proposed H-MHSA learns token relationships within local patches,serving as local relationship modeling.Then,the small patches are merged into larger ones,and H-MHSA models the global dependencies for the small number of the merged tokens.At last,the local and global attentive features are aggregated to obtain features with powerful representation capacity.Since we only calculate attention for a limited number of tokens at each step,the computational load is reduced dramatically.Hence,H-MHSA can efficiently model glob-al relationships among tokens without sacrificing fine-grained information.With the H-MHSA module incorporated,we build a family of hierarchical-attention-based transformer networks,namely HAT-Net.To demonstrate the superiority of HAT-Net in scene understand-ing,we conduct extensive experiments on fundamental vision tasks,including image classification,semantic segmentation,object detec-tion and instance segmentation.Therefore,HAT-Net provides a new perspective for vision transformers.Code and pretrained models are available at https://github.com/yun-liu/HAT-Net.

外文关键词：

Vision transformerhierarchical attentionglobal attentionlocal attentionscene understanding

作者：

Yun Liu、Yu-Huan Wu、Guolei Sun、Le Zhang、Ajad Chhatkuli、Luc Van Gool

展开 >

作者单位：

Institute for Infocomm Research(I2R),A*STAR,Singapore 138632,Singapore

Institute of High Performance Computing(IHPC),A*STAR,Singapore 138632,Singapore

Computer Vision Lab,ETH Zürich,Zürich 8092,Switzerland

School of Information and Communication Engineering,University of Electronic Science and Technology of China(UESTC),Chengdu 611731,China

展开 >

基金：

A*STAR Career Development Fund,SingaporeOpen access funding provided by Swiss Federal Institute of Technology Zurich

项目编号：

C233312006

出版年：

2024

DOI：

10.1007/s11633-024-1393-8

机器智能研究(英文)

中国科学院自动化所

机器智能研究(英文)

CSTPCDEI

影响因子：0.49

ISSN：2731-538X

年,卷(期)：2024.21(4)