首页|基于增强负例多粒度区分模型的视频动作识别研究

基于增强负例多粒度区分模型的视频动作识别研究

扫码查看
为提升模型对视频动作的细粒度区分能力,提出一种基于对比学习的增强负例区分范式.通过为每个视频生成增强负例集合,以补充最难区分的视频-文本负例对.为了进一步区分正负例,基于该范式提出一种用于视频动作识别的多粒度区分模型.在该模型中,视频表征器通过引入文本正例特征引导视频特征提取,而正负语义区分器利用自注意力机制构建正负语义之间的自相关关系.该模型既能够实现模态间视频与增强负例集的粗粒度区分,还可以实现文本模态内正例与增强负例集的细粒度区分.实验结果表明,增强负例集能显著提升模型在细粒度类别标签上的识别能力,多粒度区分模型在Kinetics-400、HMDB51和UCF101数据集上的性能均优于当前较具代表性的方法.
Study on video action recognition based on augment negative example multi-granularity discrimination model
An augment negative example discrimination paradigm based on contrastive learning was proposed to im-prove the model's fine-grained discrimination ability of video actions.The most challenging video-text negative pairs was generated,forming an augmented negative example set for each video sample.Based on this paradigm,a multi-granularity discrimination model for video action recognition was proposed to further distinguish between positive and negative examples.In this model,video features were extracted by the video representation module guided by textual positive examples,while self-correlation relationships between positive and negative semantics were established by the semantic discriminator equipped with a self-attention mechanism.Meanwhile,a coarse-grained distinction between the video modality and the augmented negative example set was achieved,while a fine-grained distinction between positive examples and the augmented negative example set within the text modality was also accomplished.Experimental results demonstrate that the augment negative set improves the model's recognition ability on fine-grained class labels,and the multi-granularity discrimination model outperforms current state-of-the-art methods on the Kinetics-400,HMDB51 and UCF101 datasets.

contrastive learningaugmented negative examplesparadigmvideo action recognition

刘良振、杨阳、夏莹杰、邝砾

展开 >

中南大学计算机学院,湖南 长沙 410083

杭州电子科技大学微电子研究院,浙江 杭州 310005

浙江大学计算机科学与技术学院,浙江 杭州 310012

对比学习 增强负例 范式 视频动作识别

2024

通信学报
中国通信学会

通信学报

CSTPCD北大核心
影响因子:1.265
ISSN:1000-436X
年,卷(期):2024.45(12)