融合Transformer和语义图卷积的三维人体姿态估计方法

3D human pose estimation method fusing Transformer and semantic graph convolution

李功浩 ¹贾振堂¹

扫码查看

作者信息

1. 上海电力大学电子与信息工程学院上海 201306
折叠

摘要

为了进一步提升从单目二维人体姿态预测三维人体姿态的方法性能,提出一种融合Transformer和语义图卷积的三维人体姿态估计模型,模型由4个部分组成,Transformer编码网络、语义图卷积编码网络、姿态坐标预测模块和姿态坐标错误回归模块.首先,Transformer编码网络对关节特征进行全局特征编码,以增强人体姿态的全局关联性.其次,语义图卷积编码网络专注于局部关节特征提取,以加强局部关节特征之间的关联性.接下来,姿态坐标预测模块和姿态坐标错误回归模块将关节全局和局部编码特征融合,以增强对三维姿态的准确建模能力.通过在Human3.6M数据集上进行实验表明,方法在估计性能方面取得了较好的改进,以真实的二维人体姿态作为输入,在 MPJPE和PA-MPJPE值分别为32.7和25.9 mm,与实验对照方法相比,性能分别提升了3.82%和1.14%.

Abstract

In order to enhance 3D human pose prediction from monocular 2D poses,we propose a model that combines Transformer and semantic graph convolution.The model consists of four components:Transformer encoding network,semantic graph convolutional encoding network,pose coordinate prediction module,and pose coordinate error regression module.The Transformer network captures global joint features to improve posture relevance,while the Semantic Graph Convolutional Encoding Network focuses on local joint feature extraction to enhance correlations.The pose prediction and error regression modules fuse global and local joint features,improving 3D pose accuracy.Experimental results on Human3.6M dataset show significant improvements,achieving MPJPE and PA-MPJPE values of 32.7 and 25.9 mm,respectively,representing a 3.82%and 1.14%improvement over the control method.

关键词

三维人体姿态/语义图卷积/Transformer

Key words

3D human pose estimation/semantic graph convolution/Transformer

引用本文复制引用

基金项目

国家自然科学基金(62105196)

出版年

2024

国外电子测量技术

北京方略信息科技有限公司

国外电子测量技术

CSTPCD

影响因子：1.414

ISSN：1002-8978

参考文献量25

段落导航