Lightweight human pose estimation based on decoupled attention and ghost convolution
With the development of lightweight networks,human pose estimation tasks can be performed on devices with limited computational resources.However,improving accuracy has become more challenging.These challenges mainly led by the contradiction between network complexity and computational resources,resulting in the sacrifice of representation capabilities when simplifying the model.To address these issues,a Decoupled attention and Ghost convolution based Lightweight human pose estimation Network(DGLNet)was proposed.Specifically,in DGLNet,with Small High-Resolution Network(Small HRNet)model as basic architecture,by introducing a decoupled attention mechanism,DFDbottleneck module was constructed.The basic modules were redesigned with shuffleblock structure,in which computationally-intensive point convolutions were replaced with lightweight ghost convolutions,and the decoupled attention mechanism was utilized to enhance module performance,leading to the creation of DGBblock module.Additionally,the original transition layer modules were replaced with redesigned depthwise separable convolution modules that incorporated ghost convolution and decoupled attention,resulting in the construction of GSCtransition module.This modification further reduced computational complexity while enhancing feature interaction and performance.Experimental results on COCO validation set show that DGLNet outperforms the state-of-the-art Lite-High-Resolution Network(Lite-HRNet)model,achieving the maximum accuracy of 71.9%without increasing computational complexity or the number of parameters.Compared to common lightweight pose estimation networks such as MobileNetV2 and ShuffleNetV2,DGLNet achieves the precision improvement of 4.6 and 8.3 percentage points respectively,while only utilizing 21.2%and 25.0%of their computational resources.Furthermore,under the AP50 evaluation criterion,DGLNet surpasses the large High-Resolution Network(HRNet)while having significantly less computational and parameters.
human pose estimationlightweight networkattention mechanismghost convolutiondepthwise separable convolution module