基于时空交叉感知的实时动作检测方法

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：时空动作检测依赖于视频空间信息与时间信息的学习.目前,最先进的基于卷积神经网络(Convolu-tionsl Neural Networks,CNN)的动作检测器采用2D CNN或3D CNN架构,取得了显著的效果.然而,由于网络结构的复杂性与时空信息感知的原因,这些方法通常采用非实时、离线的方式.时空动作检测主要的挑战在于设计高效的检测网络架构,并能有效地感知融合时空特征.考虑到上述问题,本文提出了一种基于时空交叉感知的实时动作检测方法.该方法首先通过对输入视频进行乱序重排来增强时序信息,针对仅使用2D或3D骨干网络无法有效对时空特征进行建模,提出了基于时空交叉感知的多分支特征提取网络.针对单一尺度时空特征描述性不足,提出一个多尺度注意力网络来学习长期的时间依赖和空间上下文信息.针对时序和空间两种不同来源特征的融合,提出了一种新的运动显著性增强融合策略,对时空信息进行编码交叉映射,引导时序特征和空间特征之间的融合,突出更具辨别力的时空特征表示.最后,基于帧级检测器结果在线计算动作关联性链接.本文提出的方法在两个时空动作数据集UCF101-24和JHMDB-21上分别达到了84.71%和78.4%的准确率,优于现有最先进的方法,并达到73帧/秒的速度.此外,针对JHMDB-21数据集存在高类间相似性与难样本数据易于混淆等问题,本文提出了基于动作表示的关键帧光流动作检测方法,避免了冗余光流的产生,进一步提升了动作检测准确率.

外文标题：Real-Time Action Detection Based on Spatio-Temporal Interaction Perception

外文摘要：Spatiotemporal action detection requires incorporation of video spatial and temporal information.Current state-of-the-art approaches usually use a 2D CNN(Convolutionsl Neural Networks)or a 3D CNN architecture.However,due to the complexity of network structure and spatiotemporal information extraction,these methods are usually non-real-time and offline.To solve this problem,this paper proposes a real-time action detection method based on spatiotemporal in-teraction perception.First of all,the input video is rearranged out of order to enhance the temporal information.As 2D or 3D backbone networks cannot be used to model spatiotemporal features effectively,a multi-branch feature extraction net-work is proposed to extract features from different sources.And a multi-scale attention network is proposed to extract long-term time-dependent and spatial context information.Then,for the fusion of temporal and spatial features from two differ-ent sources,a new motion saliency enhancement fusion strategy is proposed,which guides the fusion between features by encoding temporal and spatial features to highlight more discriminative spatiotemporal features.Finally,action tube links are generated online based on the frame-level detector results.The proposed method achieves an accuracy of 84.71%and 78.4%on two spatiotemporal motion datasets UCF101-24 and JHMDB-21.And it provides a speed of 73 frames per sec-ond,which is superior to the state-of-the-art methods.In addition,for the problems of high inter-class similarity and easy confusion of difficult sample data in the JHMDB-21 dataset,this paper proposes an action detection method of key frame optical flow based on action representation,which avoids the generation of redundant optical flow and further improves the accuracy of action detection.

外文关键词：

real-time action detectionmultiscale attentionspatio-temporal interaction perception

作者：

柯逍、缪欣、郭文忠

展开 >

作者单位：

福州大学计算机与大数据学院,福建福州 350116

福建省网络计算与智能信息处理重点实验室(福州大学),福建福州 350116

空间数据挖掘与信息共享教育部重点实验室,福建福州 350003

关键词：

实时动作检测多尺度注意力时空交叉感知

基金：

国家自然科学基金国家自然科学基金国家重点研发计划福建省科技重大专项福建省自然科学基金福建省自然科学基金

项目编号：

61972097U21A204722021YFB36005032021HZ0220072021J016122020J01494

出版年：

2024

DOI：

10.12263/DZXB.20220859

电子学报

中国电子学会

电子学报

CSTPCD北大核心

影响因子：1.237

ISSN：0372-2112

年,卷(期)：2024.52(2)

参考文献量51