Lightweight Human Pose Estimation algorithm Based on Spatial Cross Convolution
To address the problem that the number of parameters in the prediction phase of the lightweight OpenPose network are still large,and this can slow down model inference and is not conducive to deployment in edge devices,a human pose estimation network based on an improved convolution approach is proposed,using spatial cross-convolution to replace some of the standard convolutions and reduce the number of parameters in the prediction phase of the network. The input of the network is RGB images captured by a monocular camera. MobileNetV3-Large is used as the backbone network,and the CBAM attention module is added to the network to extract spatial and chan-nel features of different importance. After obtaining the image features,the images are fed into two branches to predict the position and combination relationship of key points. Spatial cross-convolution is used to replace some standard convolution kernels in the two branches, which can reduce the number of parameters by 80% compared with traditional convolution. The experimental results show that,compared with the original method,the total number of parameters of the proposed method is reduced by 22%with only a small decrease in accuracy. The test results of the deployment on the CPU side show that the speed can reach 6 FPS,which is nearly 4 times higher.
human posture estimationlightweight networkspatial cross-convolutionOpenPoseedge device