基于多通道特征融合的人体动作识别方法

扫码查看

原文链接

万方数据

中文摘要：现阶段,深度学习已在基于WiFi的人体动作识别领域得到广泛应用且取得显著成果.然而,在利用多输入、多输出(MIMO)系统强大的空间分集特性进行动作识别时,受多径效应影响,获得信道状态信息(CSI)存在对相同动作的特征描述存在差异、不同动作的特征描述存在类似、特征提取不完整和动作分类复杂的问题.为解决上述问题,本文提出一种基于双重注意力机制和多通道、多尺度的时间卷积网络的动作识别方法.首先,根据MIMO系统的空间分集特性,构建多通道信息提取模型,从各个天线接收到的信道中提取出有关动作的特性信息.然后,设计多尺度的统合机制,强化同一动作在不同通道接收数据的表征,通过整合不同尺度的动作特征,增强对动作的表征能力.再次,采用特征图融合注意力机制和特征通道注意力机制对各通道的动作特征进行聚合.注意力机制能有效地找出对最终动作识别有重要贡献的特征,使模型可以更好地进行特征聚焦.与此同时,将时间卷积网络应用于特征处理过程,使不同时间步的动作特征间的长期依赖关系得以维持,增加对复杂和连续动作的识别能力.最终,利用全局平均池化层(GAP)将各通道的特征图与动作分类器进行连接,以便多通道的动作特性能有效聚合在一起,进一步提高动作识别的精度.本文提出的模型在公共数据集7种动作测试中,实现98.72%的平均准确率.同时在自行搭建的实验室、教室和走廊等真实环境下进行测试时,10种不同的动作分别获得97.94%、97.28%和95.66%的识别准确率.实验结果充分证明了本文所提出的基于WiFi的人体动作识别模型在不同环境的有效性和优越性.

外文标题：Human Action Recognition Method Based on Multi-channel Fusion

外文摘要：Objective With the continuous development of science and technology,cutting-edge advancements such as artificial intelligence and deep learn-ing increasingly penetrate various fields,significantly improving social productivity.Among these,WiFi-based human action recognition has emerged as a prominent research direction,demonstrating essential application potential in smart homes,health care,military training,and other fields.However,with the diversified development of wireless communication technology,human action recognition faces new challenges,partic-ularly in the expanding applications of multiple-input multiple-output(MIMO)systems.This necessitates in-depth research and innovation to en-sure that human action recognition technology adapts to the diverse communication environments of the future.A MIMO system's multivariate spatial diversity characteristics provide higher data rates and improved signal quality due to its design for parallel transmission through multiple channels.However,in practical applications,multi-channel parallel transmission often encounters interference from the multipath effect,causing the signal arriving at the receiving antenna to exhibit complex fluctuation characteristics with varying path lengths and incident angles.Since this path information is embedded in channel state information(CSI),the characteristics of CSI differ for the same action,while different actions may exhibit remarkably similar CSI characteristics.This results in incompleteness and generalization issues in feature extraction and action classifica-tion processes.Therefore,designing an effective mechanism to extract and classify human actions in complex MIMO environments is critical.This mechanism must overcome the multipath effect in multi-channel transmission to ensure accuracy and consistency when extracting action fea-tures.In this challenging context,innovative algorithms and model designs are crucial for addressing differences in CSI features between various actions and enhancing the model's generalization ability and robustness.Method This study explores human actions'physical and MIMO transmission characteristics,proposing a deep learning-based human action re-cognition method that employs a dual attention mechanism and a multi-channel,multi-scale fusion temporal convolution network to address the above challenges.Initially,a multi-channel information extraction model is constructed to leverage the spatial diversity inherent in MIMO sys-tems.This model enhances the representation of data received for specific actions across different channels by extracting action-specific character-istic information from each antenna's received signals.Simultaneously,a multi-scale integration mechanism is applied to fuse action characterist-ics at varying scales,bolstering the system's ability to represent actions effectively.The extracted action features are then aggregated through a dual attention mechanism.The feature map fusion attention mechanism mines the correlation between action features from each channel,assign-ing higher weights to more relevant features and thereby enhancing the discriminative power of the extracted action representation.A temporal convolution network captures temporal dependencies within the extracted action features,distinguishing between actions with similar spatial char-acteristics but distinct temporal patterns.Compared to the feature map fusion attention mechanism,the feature channel attention mechanism as-signs weights to different channels based on their importance for action recognition.This design allows the model to prioritize features signific-antly contributing to action recognition,enhancing its overall recognition capabilities.A temporal convolutional network processes the extracted action features.This network performs convolution operations on features at different instances,enabling the model to capture changes over time and identify long-term dependencies between action features.This capability is crucial for accurately recognizing complex and continuous actions.A global average pooling(GAP)layer is implemented to bridge the gap between the extracted feature maps and the action classifier.This opera-tion preserves the action characteristics of each channel while facilitating a global comparison of these characteristics.Balancing the characterist-ics of each channel further improves the accuracy of action recognition.Results and Discussion Comprehensive experiments are conducted on public datasets and in real-world environments to evaluate the effective-ness of the proposed model.These experiments assess the model's performance under controlled and uncontrolled conditions,ensuring its robust-ness and adaptability to practical scenarios.In the public dataset evaluation,the proposed action recognition model achieves an exceptional accur-acy of 98.72%in identifying seven distinct human actions,surpassing the recognition performance of traditional models.This result highlights the model's effectiveness in distinguishing different actions in controlled settings.Tests are also conducted in various natural settings to validate the model's adaptability to real-world environments,including self-built laboratories,classrooms,and corridors.These environments present chal-lenges such as uncontrolled lighting conditions,background noise,and varying distances between the user and the WiFi receiver.Despite these challenges,the proposed model maintains high performance,achieving accuracy rates of 97.94%,97.28%,and 95.66%,respectively,for ten dif-ferent actions in these real-world environments.These results demonstrate the model's robustness and adaptability to real-world scenarios,mak-ing it a promising tool for practical applications.WiFi-based human action recognition offers significant potential for transforming domains such as healthcare,smart homes,and human-computer interaction.In healthcare,real-time action recognition enhances patient care,detects potential falls,and assists elderly individuals.Smart homes can evolve into intelligent living spaces,automatically adjusting conditions based on occupant activities.Human-computer interaction can also be revolutionized,enabling smoother and more natural interactions with emerging technologies.The development of this model provides reliable and intelligent human action recognition solutions for practical applications,fostering the deep integration of technology and society.This integration promotes innovation and paves the way for a connected and intelligent future.Conclusion The novel human action recognition model proposed in this study,which employs a dual attention mechanism and a multi-channel,multi-scale temporal convolutional network,represents a significant breakthrough in addressing the limitations of human action recognition in wireless environments.The model achieves remarkable accuracy in diverse environments by effectively capturing human actions'spatial and tem-poral characteristics from channel state information(CSI)data.This study holds substantial theoretical value and practical guiding significance,paving the way for future advancements in human action recognition research.In future studies,further optimization and refinement of the pro-posed model based on feedback from various natural environments will enhance its adaptability and generalizability,enabling seamless integra-tion with human activities and revolutionizing multiple domains,including healthcare,smart homes,and human-computer interaction.

外文关键词：

action recognitiondeep learningchannel state informationTCNattention

作者：

陶志勇、郭希俊、任晓奎、刘影、王泽民

展开 >

作者单位：

辽宁工程技术大学电子与信息工程学院,辽宁葫芦岛 125105

关键词：

动作识别深度学习信道状态信息 TCN 注意力

出版年：

2025

DOI：

10.12454/j.jsuese.202300307

工程科学与技术

四川大学

工程科学与技术

北大核心

影响因子：0.913

ISSN：2096-3246

年,卷(期)：2025.57(1)