面向视频行为识别深度模型的数据预处理方法

Data Preprocessing Method for Video Action Recognition Depth Models

安峰民 ¹张冰冰 ²董微 ¹张建新¹

扫码查看

作者信息

1. 大连民族大学计算机科学与工程学院,辽宁大连 116650
2. 大连理工大学信息与通信工程学院,辽宁大连 116024
折叠

摘要

以视频帧采样和数据增强为代表的预处理操作是提升视频行为识别深度模型性能的重要手段.针对现有视频数据预处理存在的采样视频帧区分性不足、数据增强方式单一等问题,提出一种面向视频行为识别深度模型的数据预处理方法.在视频帧采样上设计动作指导的片段化视频采样策略,综合考虑视频帧间差异特征与视频片段短期时序特征,通过显著行为动作获取关键视频帧并对其邻近视频帧进行采样,有效提高所选取视频帧的时空区分能力.借鉴图像分类中的随机数据增强方法,以随机数据增强方式对采样后视频短片段进行数据增强处理,使视频识别深度模型学习到更复杂的空间变化信息.根据2个公开的视频识别数据集和2个代表性的网络模型的评估实验结果表明,所提预处理方法可以使基准模型获得2.5个百分点以上的准确率提升,最高可提升6.8个百分点.上述实验结果验证了所提预处理方法在视频行为识别任务中的有效性.

Abstract

Video preprocessing operations,mainly including video frame sampling and data augmentation,are essential methods to improving the performance of deep video models in action recognition,and they have recently received increased attention.In this study,a novel data preprocessing method for video action recognition depth models is proposed,with a focus on video preprocessing problems of insufficient guidance during key frame sampling and relatively simple data amplification methods.First,a novel motion-guided fragmented video sampling technique is designed,which comprehensively considers features among different video frames and short-term timing features of video clips.It acquires key video frames guided by significant motion actions and sampling adjacent video frames of these key frames,which effectively improves the spatiotemporal discrimination capacity of the selected video frames.Additionally,motivated by the random data augmentation successfully applied in image classification task,this study further introduces the random data augmentation strategy to augment the sampled short video clips.This ensures that video recognition depth models can learn more complex spatially varying information.Based on evaluation experiments using two public video recognition datasets and two representative network models,the results show that the proposed preprocessing method can improve the accuracy of the baseline model by more than 2.5 percentage points,with the highest improvement accuracy of 6.8 percentage points.The results demonstrate the effectiveness of the method in video action recognition.

关键词

视频行为识别/预处理方法/动作指导的片段化视频采样/数据增强/深度学习

Key words

video action recognition/preprocessing method/motion-guided fragmented video sampling/data aug-mentation/deep learning

引用本文复制引用

基金项目

国家自然科学基金(61972062)

辽宁省应用基础研究计划项目(2023JH2/101300191)

辽宁省应用基础研究计划项目(2023JH2/101300193)

出版年

2024

计算机工程

华东计算技术研究所　上海市计算机学会

计算机工程

CSTPCDCSCD北大核心

影响因子：0.581

ISSN：1000-3428

参考文献量3

段落导航