池州学院学报2024,Vol.38Issue(3) :21-27.DOI:10.13420/j.cnki.jczu.2024.03.005

一种基于帧序列特征的三流网络人体行为识别方法

A frame sequence feature based method on human action recognition

黄瑞丰 陈冲 程睿 王旭 张龙凤
池州学院学报2024,Vol.38Issue(3) :21-27.DOI:10.13420/j.cnki.jczu.2024.03.005

一种基于帧序列特征的三流网络人体行为识别方法

A frame sequence feature based method on human action recognition

黄瑞丰 1陈冲 2程睿 2王旭 2张龙凤2
扫码查看

作者信息

  • 1. 合肥涌现智能科技有限公司,安徽合肥 230093;中国科学技术大学先进技术研究院,安徽合肥 230031;安徽建筑大学电子与信息工程学院,安徽合肥 230601
  • 2. 安徽建筑大学电子与信息工程学院,安徽合肥 230601
  • 折叠

摘要

随着计算机科学和深度学习技术的发展,人体行为识别研究逐渐成为计算机视觉的一个重要课题.目前主流的双流网络模型无法做到在提取图像和运动特征的同时提取视频的帧间序列特征,当局部序列特征与长短时运动特征发生时空交互时,双流网络模型鲁棒性严重降低.针对于此,提出了一种基于视频序列特征的三流网络人体行为识别方法.通过预处理将视频的稠密光流帧输入时间网络,RGB帧输入空间网络和帧序列特征提取网络,同时对三个网络进行预训练.网络输出其对应的特征后使用权重相加的融合方法进行特征融合,最后采用多层感知机得到行为分类结果.将该方法分别在UCF11、UCF50和HMDB51数据集进行实验,得到行为分类准确率分别为99.17%、97.40%和96.88%.与传统的双流网络方法相比,该方法有效综合了行为的空间信息,时间信息和帧序列信息,识别准确率得到较大提升,具有更强的泛化能力.

Abstract

Human action recognition research has evolved in tandem with the discipline of computer science and contemporary techniques of deep learning over time,which caused it is one of the most promising research directions in computer vision.In recent years the traditional two-stream network model is ineffective in extracting the interframe sequence features of the video simultaneously with the image and motion features.Resulting a decrease in two-stream model robustness when local sequence information and long-term motion information interact.Through pre-processing the dense optical flow frames of videos are entered into a temporal network,RGB frames into a spatial network and a frame sequence feature extracting network,all three networks are concurrently pretrained.At the conclusion of training,the operation of feature extraction is executed,the features are incorporated with the parallel fusion algorithm by adding weights,and the behavior categories are classified using Multi-Layer Perception.Experimental results on the UCF11,UCF50,and HMDB51 datasets demonstrate that our model effectively integrates the spatial-temporal and frame-sequence information of human actions,resulting in a significant improvement in recognition accuracy.Its classification accuracy on the three datasets was 99.17%,97.40%,and 96.88%,respectively,significantly enhancing the generalization capability and validity of conventional two-stream or three-stream models.

关键词

人体行为识别/三流网络/帧序列特征/UCF11/UCF50/HMDB51

Key words

Human action recognition/Three-stream network/Frame sequence feature/UCF11/UCF50/HMDB51

引用本文复制引用

出版年

2024
池州学院学报
池州学院

池州学院学报

影响因子:0.194
ISSN:1674-1102
段落导航相关论文