This paper investigates the role of global context for crowd counting.Specifically,a pure transformer is used to extract fea-tures with global information from overlapping image patches.Inspired by classification,we add a context token to the input sequence,to facilitate information exchange with tokens corresponding to image patches throughout transformer layers.Due to the fact that trans-formers do not explicitly model the tried-and-true channel-wise interactions,we propose a token-attention module(TAM)to recalibrate encoded features through channel-wise attention informed by the context token.Beyond that,it is adopted to predict the total person count of the image through regression-token module(RTM).Extensive experiments on various datasets,including ShanghaiTech,UCF-QNRF,JHU-CROWD++and NWPU,demonstrate that the proposed context extraction techniques can significantly improve the per-formance over the baselines.