Lightweight human pose estimation algorithm combined with coordinate Transformer
Addressing issues such as large model size,high computational costs,and limited compatibility with edge devices in most existing bottom-up human pose estimation algorithms,this study proposed a lightweight multi-person pose estimation network model named YOLOv5s6-Pose-CT based on YOLOv5s6-Pose.In order to reduce feature redundancy across both spatial and channel dimensions,the network model introduced spatial and channel reconstruction convolution in the neck network.Simultaneously,a coordinate Transformer was incorporated into the backbone network to enhance long-distance dependence while maintaining efficient local feature extraction ability.Furthermore,unbiased feature position alignment was employed to resolve feature dislocation during multi-scale fusion.Finally,this study redefined the regression loss of bounding boxes using the MPDIoU(minimum point distance-based IoU)loss function.Experimental results on the COCO 2017 dataset demonstrated that compared with EfficientHRNet-H1(a mainstream lightweight network),our optimized network model reduced parameters by 16.2%and computation by 66.1%,respectively,while maintaining comparable accuracy levels.Moreover,compared with the baseline approach,our proposed model achieved parameter and computation reductions of 11.2%and 5.8%,respectively,along with improvements of 2.5%in average detection accuracy and 2.6%in recall rate.
human pose estimationlightweightcoordinate Transformerunbiased feature position alignmentloss function