Action Transformer: A self-attention model for short-time pose-based human action recognition

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer. In Human Action Recognition (HAR), attention mecha-nisms have been primarily adopted on top of standard convolutional or recurrent layers, improving the overall generalization capability. In this work, we introduce Action Transformer (AcT), a simple, fully, self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent, and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance. Moreover, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal train-ing and evaluation benchmark for real-time, short-time HAR. The proposed methodology was extensively tested on MPOSE2021 and compared to several state-of-the-art architectures, proving the effectiveness of the AcT model and laying the foundations for future work on HAR. (c) 2021 Elsevier Ltd. All rights reserved.

外文关键词：

Human action recognitionDeep learningComputer visionTransformerNETWORK

作者：

Mazzia, Vittorio、Angarano, Simone、Salvetti, Francesco、Angelini, Federico、Chiaberge, Marcello

展开 >

作者单位：

Politecn Torino

Newcastle Univ

出版年：

2022

DOI：

10.1016/j.patcog.2021.108487

Pattern Recognition

EISCI

ISSN：0031-3203

年,卷(期)：2022.124

被引量68
参考文献量56