首页|Mobile-friendly and multi-feature aggregation via transformer for human pose estimation

Mobile-friendly and multi-feature aggregation via transformer for human pose estimation

扫码查看
© 2024Human pose estimation is pivotal for human-centric visual tasks, yet deploying such models on mobile devices remains challenging due to high parameter counts and computational demands。 In this paper, we study Mobile-Friendly and Multi-Feature Aggregation architectural designs for human pose estimation and propose a novel model called MobileMultiPose。 Specifically, a lightweight aggregation method, incorporating multi-scale and multi-feature, mitigates redundant shallow semantic extraction and local deep semantic constraints。 To efficiently aggregate diverse local and global features, a lightweight transformer module, constructed from a self-attention mechanism with linear complexity, is designed, achieving deep fusion of shallow and deep semantics。 Furthermore, a multi-scale loss supervision method is incorporated into the training process to enhance model performance, facilitating the effective fusion of edge information across various scales。 Extensive experiments show that the smallest variant of MobileMultiPose outperforms lightweight models (MobileNetv2, ShuffleNetv2, and Small HRNet) by 0。7, 5。4, and 10。1 points, respectively, on the COCO validation set, with fewer parameters and FLOPs。 In particular, the largest MobileMultiPose variant achieves an impressive AP score of 72。4 on the COCO test-dev set, notably, its parameters and FLOPs are only 16% and 18% of HRNet-W32, and 7% and 9% of DARK, respectively。 We aim to offer novel insights into designing lightweight and efficient feature extraction networks, supporting mobile-friendly model deployment。

Human pose estimationHybrid architectureLightweight networkMulti-feature aggregation

Li B.、Tang S.、Li W.

展开 >

School of Information and Control Engineering China University of Mining and Technology||School of Mechanical and Electronic Engineering Suzhou University

School of Information and Control Engineering China University of Mining and Technology

2025

Image and vision computing

Image and vision computing

SCI
ISSN:0262-8856
年,卷(期):2025.153(Jan.)
  • 71