Rethinking Global Context in Crowd Counting

扫码查看

原文链接

NETL
NSTL
万方数据

外文摘要：This paper investigates the role of global context for crowd counting.Specifically,a pure transformer is used to extract fea-tures with global information from overlapping image patches.Inspired by classification,we add a context token to the input sequence,to facilitate information exchange with tokens corresponding to image patches throughout transformer layers.Due to the fact that trans-formers do not explicitly model the tried-and-true channel-wise interactions,we propose a token-attention module(TAM)to recalibrate encoded features through channel-wise attention informed by the context token.Beyond that,it is adopted to predict the total person count of the image through regression-token module(RTM).Extensive experiments on various datasets,including ShanghaiTech,UCF-QNRF,JHU-CROWD++and NWPU,demonstrate that the proposed context extraction techniques can significantly improve the per-formance over the baselines.

外文关键词：

Crowd countingvision transformerglobal contextattentiondensity map

作者：

Guolei Sun、Yun Liu、Thomas Probst、Danda Pani Paudel、Nikola Popovic、Luc Van Gool

展开 >

作者单位：

Computer Vision Lab,ETH Zürich,Zürich 8092,Switzerland

Institute for Infocomm Research,A*STAR,Singapore 138632,Singapore

Magic Leap,Zürich 8050,Switzerland

出版年：

2024

DOI：

10.1007/s11633-023-1475-z

机器智能研究(英文)

中国科学院自动化所

机器智能研究(英文)

CSTPCDEI

影响因子：0.49

ISSN：2731-538X

年,卷(期)：2024.21(4)