中国科学:技术科学(英文版)2024,Vol.67Issue(1) :197-208.DOI:10.1007/s11431-023-2491-4

Segment differential aggregation representation and supervised compensation learning of ConvNets for human action recognition

REN ZiLiang ZHANG QieShi CHENG Qin XU ZhenYu YUAN Shuai LUO DeLin
中国科学:技术科学(英文版)2024,Vol.67Issue(1) :197-208.DOI:10.1007/s11431-023-2491-4

Segment differential aggregation representation and supervised compensation learning of ConvNets for human action recognition

REN ZiLiang 1ZHANG QieShi 2CHENG Qin 3XU ZhenYu 2YUAN Shuai 4LUO DeLin5
扫码查看

作者信息

  • 1. School of Computer Science and Technology,Dongguan University of Technology,Dongguan 523808,China;Shenzhen Institute of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China
  • 2. Shenzhen Institute of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China
  • 3. Shenzhen Institute of Advanced Technology,Chinese Academy of Sciences,Shenzhen 518055,China;School of Electronic Engineering and Automation,Guilin University of Electronic Technology,Guilin 541004,China
  • 4. Department of the mechanical engineering,Erlangen-Nuremberg University,Erlangen 91508,Germany
  • 5. School of Aerospace Engineering,Xiamen University Xiamen 361102,China
  • 折叠

Abstract

With more multi-modal data available for visual classification tasks,human action recognition has become an increasingly at-tractive topic.However,one of the main challenges is to effectively extract complementary features from different modalities for action recognition.In this work,a novel multimodal supervised learning framework based on convolution neural networks(ConvNets)is proposed to facilitate extracting the compensation features from different modalities for human action recogni-tion.Built on information aggregation mechanism and deep ConvNets,our recognition framework represents spatial-temporal information from the base modalities by a designed frame difference aggregation spatial-temporal module(FDA-STM),that the networks bridges information from skeleton data through a multimodal supervised compensation block(SCB)to supervise the extraction of compensation features.We evaluate the proposed recognition framework on three human action datasets,including NTU RGB+D 60,NTU RGB+D 120,and PKU-MMD.The results demonstrate that our model with FDA-STM and SCB achieves the state-of-the-art recognition performance on three benchmark datasets.

Key words

action recognition/segment frame difference aggregation/supervised compensation learning/ConvNets

引用本文复制引用

基金项目

Natural Science Foundation of Guangdong Province(2022A1515140119)

Natural Science Foundation of Guangdong Province(2023A1515011307)

National Key Laboratory of Airbased Information Perception and Fusion()

Aeronautic Science Foundation of China(20220001068001)

Dongguan Science and Technology Special Commissioner Project(20221800500362)

National Natural Science Foundation of China(62376261)

National Natural Science Foundation of China(61972090)

National Natural Science Foundation of China(U21A20487)

出版年

2024
中国科学:技术科学(英文版)
中国科学院

中国科学:技术科学(英文版)

CSTPCDEI
影响因子:1.056
ISSN:1674-7321
参考文献量1
段落导航相关论文