基于注意力机制和能量函数的动作识别算法

扫码查看

原文链接

万方数据
维普

中文摘要：针对零样本动作识别(ZSAR)算法的框架缺乏结构性指导的问题,以基于能量的模型(EBM)指导框架设计,提出基于注意力机制和能量函数的动作识别算法(ARAAE).首先,为了得到EBM的输入,设计了光流加3D卷积(C3D)架构的组合以提取视觉特征,从而达到空间去冗余的效果;其次,将视觉Transformer(ViT)用于视觉特征的提取以减少时间冗余,同时利用ViT配合光流加C3D架构的组合以减少空间冗余,从而获得非冗余视觉空间;最后,为度量视觉空间和语义空间的相关性,实现能量评分评估机制,设计联合损失函数来进行优化实验.采用6个经典ZSAR算法及近年文献里的算法在两个数据集HMDB51 和UCF101 进行实验的结果表明:相较于CAGE(Coupling Adversarial Graph Embedding)、Bi-dir GAN(Bi-directional Generative Adversarial Network)和 ETSAN(Energy-based Temporal Summarized Attentive Network)等算法,在平均分组的HMDB51数据集上,ARAAE平均识别准确率提升至(22.1±1.8)%,均明显优于对比算法;在平均分组的UCF101数据集上,ARAAE的平均识别准确率提升至(22.4±1.6)%,略优于对比算法;在以81/20为分割方式的UCF101数据集上,ARAAE的平均识别准确率提升至(40.2±2.6)%,均大于对比算法.可见,ARAAE在ZSAR中能有效提高识别性能.

外文标题：Action recognition algorithm based on attention mechanism and energy function

外文摘要：Addressing the insufficiency of structural guidance in the framework of Zero-Shot Action Recognition(ZSAR)algorithms,an Action Recognition Algorithm based on Attention mechanism and Energy function(ARAAE)was proposed guided by the Energy-Based Model(EBM)for framework design.Firstly,to obtain the input for EBM,a combination of optical flow and Convolutional 3D(C3D)architecture was designed to extract visual features,achieving spatial non-redundancy.Secondly,Vision Transformer(ViT)was utilized for visual feature extraction to reduce temporal redundancy,and ViT cooperated with combination of optical flow and C3D architecture was used to reduce spatial redundancy,resulting in a non-redundant visual space.Finally,to measure the correlation between visual space and semantic space,an energy score evaluation mechanism was realized with the design of a joint loss function for optimization experiments.Experimental results on HMDB51 and UCF101 datasets using six classical ZSAR algorithms and algorithms in recent literature show that on the HMDB51 dataset with average grouping,the average recognition accuracy of ARAAE is(22.1±1.8)%,which is better than those of CAGE(Coupling Adversarial Graph Embedding),Bi-dir GAN(Bi-directional Generative Adversarial Network)and ETSAN(Energy-based Temporal Summarized Attentive Network).On UCF101 dataset with average grouping,the average recognition accuracy of ARAAE is(22.4±1.6)%,which is better than those of all comparison algorithm slightly.On UCF101 with 81/20 dataset segmentation method,the average recognition accuracy of ARAAE is(40.2±2.6)%,which is higher than those of the comparison algorithms.It can be seen that ARAAE improves the recognition performance in ZSAR effectively.

外文关键词：

Zero-Shot Action Recognition(ZSAR)energy functionattention mechanismoptical flowvisual feature

作者：

王丽芳、吴荆双、尹鹏亮、胡立华

展开 >

作者单位：

太原科技大学计算机科学与技术学院,太原 030024

上海方宜万强微电子有限公司,西安 710000

关键词：

零样本动作识别能量函数注意力机制光流法视觉特征

出版年：

2025

DOI：

10.11772/j.issn.1001-9081.2024010004

计算机应用

中国科学院成都计算机应用研究所

计算机应用

北大核心

影响因子：0.892

ISSN：1001-9081

年,卷(期)：2025.45(1)