首页|基于改进Transformer的三维人体姿态估计

基于改进Transformer的三维人体姿态估计

扫码查看
本文设计一种用于三维(3D)人体姿态(pose)估计的改进Transformer的多级特征编码网络.采用空间池化(pooling)算子结构来替换注意力(Attention)模块,缩减了模型参数量和运行复杂度,串联该结构得到初始特征表示,然后使用交叉注意力(CA)机制进行特征信息交互学习,并应用跨步卷积降低时间维度并合并相近的Pose到Pose序列的单个表示.在Human3.6M数据集上进行验证实验.结果表明:该方法针对3D人体Pose估计,混合使用Pooling结构和Attention机制能达到有效的估计效果,与原始Trans-former的方法进行对比,模型参数量降低了30%,位置精度提升了8.6%.
3D human body pose estimation based on improved Transformer
An improved Transformer multi-level feature encoding network for 3D human pose estimation is designed.The spatial pooling operator structure is used to replace the Attention module,which reduces the amount of model parameters and operation complexity.The structure is connected in series to obtain the initial feature representation.Then,the cross-attention(CA)mechanism is used for interactive learning of feature information, and strided convolution is used to reduce the time dimension,and similar Poses are combined into a single representation of Pose sequences.Results of verification experiment on Human3.6 M datasets show that this method can achieve effective estimation effect for 3D human Pose estimation by using Pooling structure and attention mechanism.Compared with the original Transformer method,the amount of model parameters is reduced by 30% and the positional precision is improved by 8.6%.

pose estimationTransformer modelspatial pooling operatorcross-attention mechanismstrided convolution

陈从平、郁春明、闫焕章、江高勇、张屹、戴国洪

展开 >

常州大学机械与轨道交通学院,江苏常州213164

姿态估计 Transformer模型 空间池化算子 交叉注意力机制 跨步卷积

国家自然科学基金资助项目国家重点研发计划资助项目

518750532018YFC1903101

2024

传感器与微系统
中国电子科技集团公司第四十九研究所

传感器与微系统

CSTPCD北大核心
影响因子:0.61
ISSN:1000-9787
年,卷(期):2024.43(6)
  • 16