时空关联的Transformer骨架行为识别

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目前主流的骨架行为识别方法采取关节流、骨骼流及其对应的运动流作为多流网络分别进行训练,造成训练成本高,另外,在特征提取过程中,忽略了对复杂时空依赖关系的建模,以及在时域上的信息交流采取大尺度卷积,导致聚合大量冗余信息.针对以上问题,提出一种时空关联的Transformer骨架行为识别方法.首先,构建运动融合模块,以关节流和骨骼流作为双流输入,在特征级别将各自的运动信息进行融合,减少单独训练运动流的成本;其次,提出移位Transformer模块,利用时间移位操作混合时空信息的特性,配合Transformer低成本地捕获短期时空依赖关系;然后,设计多尺度时间卷积进行时域长期信息交流;最后,融合双流得分获得最终分类预测.在大规模数据集NTU RGB+D以及NTU RGB+D 120上进行实验,结果表明,该模型在NTU RGB+D数据集的两种评价标准X-Sub和X-View上分别达到了91.5%和96.3%的识别准确率,在NTU RGB+D 120数据集两种评价标准X-Sub和X-Set上分别达到了87.2%和89.3%的识别准确率,本文所提方法的识别准确率相对主流骨架行为识别方法有明显提升,验证了模型的有效性和通用性.

外文标题：Space-Time-Correlated Transformer for Skeleton-Based Action Recognition

外文摘要：At present,the most common skeleton action recognition methods adopt a joint stream,bone stream,and corre-sponding motion stream as multi-stream networks for separate training operations,which results in high training costs.In ad-dition,the modeling of complex spatio-temporal dependencies is neglected in the feature extraction process,and large-scale convolution is adopted for the exchange of information in the temporal domain,leading to the aggregation of a large amount of redundant information.A space-time-correlated transformer skeleton action recognition method was investigated to address these problems.First,a motion fusion module was constructed to reduce the cost of training motion streams separately by us-ing joint and skeletal streams as inputs and fusing the respective motion information at the feature level.Second,a shift trans-former module was proposed,which used the characteristics of the temporal shift operation to mix spatio-temporal informa-tion with the transformer to capture the short-term spatio-temporal dependencies at a low cost.Then,a multiscale temporal convolution was designed for time-domain long-term information.Finally,the final classification prediction was obtained by fusing the two-stream scores.Experiments on the large-scale datasets NTU RGB+D and NTU RGB+D 120 showed that the model achieved recognition accuracies of 91.5%and 96.3%on the two evaluation standards X-Sub and X-View for the NTU RGB+D dataset,respectively;and 87.2%and 89.3%on the two evaluation standards X-Sub and X-Set for the NTU RGB+D 120 dataset,respectively.The recognition accuracy of the proposed method was significantly better than those of the most commonly used skeleton action recognition methods,which verified the effectiveness and generality of the model.

外文关键词：

Transformer networkhuman skeletonmulti-scale convolutionmotion informationaction recognition

作者：

卢先领、杨嘉琦

展开 >

作者单位：

江南大学"轻工过程先进控制"教育部重点实验室,江苏无锡 214122

江南大学物联网工程学院,江苏无锡 214122

关键词：

Transformer网络人体骨架多尺度卷积运动信息动作识别

基金：

国家自然科学基金

项目编号：

61773181

出版年：

2024

DOI：

10.16798/j.issn.1003-0530.2024.04.014

信号处理

中国电子学会

信号处理

CSTPCD北大核心

影响因子：1.502

ISSN：1003-0530

年,卷(期)：2024.40(4)

被引量1
参考文献量6