首页|基于多模态注意力网络的红外人体行为识别方法

基于多模态注意力网络的红外人体行为识别方法

扫码查看
深度学习网络对红外单一模态数据的学习表征能力具有一定的局限性,针对该问题,文中提出了基于多模态注意力网络的红外人体行为识别方法.由于深度学习网络模型无法直接对视频信息进行训练和分类,首先,通过预处理模块将得到的视频信息预处理成红外视图,再将得到的红外视图通过Sobel算子和基于L1范数的全变分光流法分别提取红外视图的边缘信息和光流信息得到边缘视图和光流视图;其次,将红外视图、边缘视图、光流视图分别输入融合注意力机制模块的三流网络中进行特征学习;然后,对三流网络中每个网络提取的多模态特征进行融合;最后,将融合得到的特征向量输入随机森林进行训练和分类.在公开数据集NTU RGB+D和自建数据集上进行实验,结果表明了所提方法具有不错的识别效果.
Infrared Human Action Recognition Method Based on Multimodal Attention Network
Human behavior recognition has become one of the research hotspots in the field of machine vision and pattern recogni-tion,and has important research value.Many intelligent services require rapid and accurate recognition of human behavior.Human behavior recognition has important research significance and wide application value in fields such as intelligent monitoring and smart home,and has been widely studied by scholars at home and abroad.Human behavior recognition usually uses visible light video data,but visible light videos are easily affected by light and cannot adapt to nighttime recognition.Due to the characteristics of infrared information such as being less affected by light and protecting privacy,human behavior recognition methods based on infrared video have great significance.Deep learning network has some limitations on the learning and representation ability of in-frared single mode data.To solve the above problems,an infrared human behavior recognition method based on multimodal atten-tion network is proposed.Because the deep learning network model cannot directly train and classify the video information,first,the preprocessing module preprocesses the video information obtained into infrared views,and then extracts the edge information and optical flow information of the infrared view through Sobel operator and L1 norm based total variation optical flow method to obtain the edge view and optical flow view respectively.Secondly,input the infrared view,edge view,and optical flow view into the three stream network fused with the attention mechanism module for feature learning.Then,fuse the multimodal features ex-tracted from each network in the three stream network.Finally,the fusion feature vector is input to random forest for training and classification.Experimental results on the public dataset NTU RGB+D and the self-built dataset indicate that the proposed me-thod has good recognition performance.In the future,we will consider expanding our method to more datasets to verify its effec-tiveness.

MultimodalAttention mechanismThree stream networkFeature fusionRandom forest

汪超、唐超、王文剑、张靖

展开 >

合肥学院人工智能与大数据学院 合肥 230601

山西大学计算机与信息技术学院 太原 030006

中国科学技术大学研究生院科学岛分院 合肥 230031

多模态 注意力机制 三流网络 特征融合 随机森林

国家自然科学基金国家自然科学基金安徽省自然科学基金合肥学院科研项目安徽省研究生学术创新项目安徽省大学生创新创业训练计划项目

62076154U21A205132008085MF202220501230102022xscx1451602582519599861760

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(8)
  • 2