首页|Rethinking Global Context in Crowd Counting

Rethinking Global Context in Crowd Counting

扫码查看
This paper investigates the role of global context for crowd counting.Specifically,a pure transformer is used to extract fea-tures with global information from overlapping image patches.Inspired by classification,we add a context token to the input sequence,to facilitate information exchange with tokens corresponding to image patches throughout transformer layers.Due to the fact that trans-formers do not explicitly model the tried-and-true channel-wise interactions,we propose a token-attention module(TAM)to recalibrate encoded features through channel-wise attention informed by the context token.Beyond that,it is adopted to predict the total person count of the image through regression-token module(RTM).Extensive experiments on various datasets,including ShanghaiTech,UCF-QNRF,JHU-CROWD++and NWPU,demonstrate that the proposed context extraction techniques can significantly improve the per-formance over the baselines.

Crowd countingvision transformerglobal contextattentiondensity map

Guolei Sun、Yun Liu、Thomas Probst、Danda Pani Paudel、Nikola Popovic、Luc Van Gool

展开 >

Computer Vision Lab,ETH Zürich,Zürich 8092,Switzerland

Institute for Infocomm Research,A*STAR,Singapore 138632,Singapore

Magic Leap,Zürich 8050,Switzerland

2024

机器智能研究(英文)
中国科学院自动化所

机器智能研究(英文)

CSTPCDEI
影响因子:0.49
ISSN:2731-538X
年,卷(期):2024.21(4)