基于改进Transformer的三维人体姿态估计

3D human body pose estimation based on improved Transformer

陈从平 ¹郁春明 ¹闫焕章 ¹江高勇 ¹张屹 ¹戴国洪¹

扫码查看

作者信息

1. 常州大学机械与轨道交通学院,江苏常州213164
折叠

摘要

本文设计一种用于三维(3D)人体姿态(pose)估计的改进Transformer的多级特征编码网络.采用空间池化(pooling)算子结构来替换注意力(Attention)模块,缩减了模型参数量和运行复杂度,串联该结构得到初始特征表示,然后使用交叉注意力(CA)机制进行特征信息交互学习,并应用跨步卷积降低时间维度并合并相近的Pose到Pose序列的单个表示.在Human3.6M数据集上进行验证实验.结果表明:该方法针对3D人体Pose估计,混合使用Pooling结构和Attention机制能达到有效的估计效果,与原始Trans-former的方法进行对比,模型参数量降低了30％,位置精度提升了8.6％.

Abstract

An improved Transformer multi-level feature encoding network for 3D human pose estimation is designed.The spatial pooling operator structure is used to replace the Attention module,which reduces the amount of model parameters and operation complexity.The structure is connected in series to obtain the initial feature representation.Then,the cross-attention(CA)mechanism is used for interactive learning of feature information, and strided convolution is used to reduce the time dimension,and similar Poses are combined into a single representation of Pose sequences.Results of verification experiment on Human3.6 M datasets show that this method can achieve effective estimation effect for 3D human Pose estimation by using Pooling structure and attention mechanism.Compared with the original Transformer method,the amount of model parameters is reduced by 30％ and the positional precision is improved by 8.6％.

关键词

姿态估计/Transformer模型/空间池化算子/交叉注意力机制/跨步卷积

Key words

pose estimation/Transformer model/spatial pooling operator/cross-attention mechanism/strided convolution

引用本文复制引用

基金项目

国家自然科学基金资助项目(51875053)

国家重点研发计划资助项目(2018YFC1903101)

出版年

2024

传感器与微系统

中国电子科技集团公司第四十九研究所

传感器与微系统

CSTPCD北大核心

影响因子：0.61

ISSN：1000-9787

参考文献量16

段落导航