融合CNN与时空分离ViT的人体行为识别算法研究

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目前在计算机视觉领域,视频行为识别技术已经取得了一定的发展,但仍有一定改进的空间.为解决当下行为识别领域的识别精度问题,提出一种融合CNN与时空分离ViT的网络模型,来提高行为分类识别的准确率.该模型主要将传统ViT模型的编码器结构演变为时间编码器和空间编码器,将时间和空间编码器串联提取视频特征后与CNN卷积所提取的特征进行融合来提高识别效果.实验的结果表明,融合CNN与时空分离ViT的网络模型在识别效果上具有一定的优越性,为人体行为识别算法设计提供了新思路.

外文标题：Research on HumanAction Recognition Algorithm by Fusing CNN and Spa-tio-Temporal Separation ViT

外文摘要：Currently in the field of computer vision,video action recognition technology has made some development,but there is still some room for improvement.In order to solve the problem of recognition accuracy in the field of action recognition nowadays,a network model fusing CNN and spatio-temporal separation ViT is proposed to improve the accuracy of action classification and recognition.The encoder structure of the traditional ViT model is mainly e-volved into a temporal encoder and spatial encoders.The temporal and spatial encoders extract video features in series and fuses with the features extracted by CNN to improve the recognition effect.The results of the experiments show that the network model fusing CNN and spatio-temporal separated ViT has certain superiority in recognition effect,which provides a new idea for the design of human action recognition algorithm.

外文关键词：

CNNViTspatio-temporal separationaction recognition

作者：

刘岩石、赵建光、张君秋、焦瑜帆

展开 >

作者单位：

河北建筑工程学院,河北张家口 075000

关键词：

CNN ViT 时空分离行为识别

基金：

&&&&&&

项目编号：

2221008A2311010AXY2023079

出版年：

2024

DOI：

10.20153/j.issn.2096-9759.2024.03.035

长江信息通信

湖北通信服务公司

长江信息通信

影响因子：0.338

ISSN：2096-9759

年,卷(期)：2024.37(3)

参考文献量7