基于时空Transformer的毫米波雷达三维人体姿态重构

Millimeter Wave Radar Based on Spatial-Temporal Transformer 3D Human Posture Reconstruction

余亚男 ¹贾勇 ¹杜玲丽 ¹林凡强 ¹郭世盛²

扫码查看

作者信息

1. 成都理工大学机电工程学院,四川成都 610059
2. 电子科技大学信息与通信工程学院,四川成都 611731
折叠

摘要

深度学习技术使得从毫米波雷达捕获的人体散射信号中精确提取人体运动特征并重构三维人体姿态成为可能.然而,目前毫米波雷达人体姿态重构常采用直接将雷达图像映射到三维关节点坐标的单阶段策略,这种跨域层级映射任务使得网络在重构精度、深度信息表达及姿态连贯性上面临挑战.针对这一问题,本文提出了一种基于时空Transformer的多阶段毫米波雷达三维人体姿态重构模型(Spatial-Temporal Pose Reconstruction Transformer,STPRT),通过两阶段策略处理提高重构精度:第一阶段,构建并行多分辨率子网络从水平和垂直雷达图像中提取多尺度的二维关节点信息和空间位置特征并进行融合,随后由全连接层生成二维人体姿态坐标;第二阶段,时空Transformer通过空间注意力模块对每帧中的二维关节坐标进行高维空间特征编码,时间注意力模块捕捉姿态特征在序列帧中的时间演变,增强姿态间的深度感知和空间准确性,实现从二维姿态到三维姿态的映射提升.此外,在训练过程中引入了指数移动平均(Exponential Moving Average,EMA)策略调整梯度下降,从而提升整体映射的精确度和连贯性.在毫米波雷达公开数据集RFSkeleton3D上的验证表明,相比现有的mm-Pose和RPM模型,本模型在减少参数量的同时,将平均关节位置误差降低至7.3 cm.

Abstract

Deep learning technology facilitates the accurate extraction of human motion features and reconstruction of 3D poses by using millimeter wave(mm Wave)radar signals.However,current mm Wave radar human posture recon-struction frequently adopts a single-stage strategy,which involves directly mapping radar images to 3D joint coordi-nates.Implementation of this cross-domain hierarchical mapping task creates challenges for the network in terms of re-construction accuracy,depth-information expression,and pose coherence To address this problem,this paper proposes a 3D human pose reconstruction model using multi-stage mm Wave radar,termed the spatial-temporal pose reconstruc-tion transformer(STPRT),which improves reconstruction accuracy using a two-stage strategy.First,a parallel multi-resolution subnetwork is constructed to extract multi-scale 2D joint information and spatial position features from hori-zontal and vertical radar images and fuse them,after which the fully connected layer generates 2D human pose coordi-nates.Second,the spatial-temporal Transformer encodes the high-dimensional spatial features of the 2D joint coordi-nates in each frame using the spatial attention module.The temporal attention module captures the temporal evolution of pose features in the sequence frames,enhances the depth perception and spatial accuracy between poses,and improves the mapping process from the 2-3D pose.In addition,the exponential moving average(EMA)strategy is employed dur-ing the training process to adjust the gradient descent,thereby improving overall mapping accuracy and consistency.Verification using the mm Wave radar public dataset RFSkeleton3D demonstrate that,compared with existing mm-Pose and RF-based pose machine(RPM)models,the proposed model reduces the average joint position error to 7.3 cm and decreases the number of parameters.

关键词

毫米波雷达/姿态重构/时空Transformer/层级映射

Key words

millimeter wave radar/posture reconstruction/spatial-temporal Transformer/hierarchical mapping

引用本文复制引用

基金项目

国家自然科学基金(62001091)

四川省科技厅计划项目(2022YFS0531)

成都市"揭榜挂帅"科技项目(2023-JB00-00032-GX)

衢州市政府资助项目(2022D008)

衢州市政府资助项目(2022D005)

出版年

2024

信号处理

中国电子学会

信号处理

CSTPCDCSCD北大核心

影响因子：1.502

ISSN：1003-0530

参考文献量7

段落导航