首页|基于改进高分辨率网络的人体姿态估计

基于改进高分辨率网络的人体姿态估计

扫码查看
为实现更精准的人体关键点定位,以高分辨率检测网络(HRNet)为基线引入瀑布式空洞空间卷积模块与Transformer的人体姿态估计模型和算法.首先,构建瀑布式空洞空间卷积模块替换HRNet的第4阶段,减少不同尺度特征相互融合导致参数量过大的问题,并更高效地提取多尺度特征;其次,引入基于自注意力机制的Transformer对提取的高层特征进行处理,通过捕获全局空间中关键点的非局部交互关系以获取全局信息实现特征增强.实验表明,当输入图像分辨率为256×192时,所提模型相较于HRNet-W32和HRNet-W48基线模型,在参数量下降的情况下AP分别提升2.4%、2.3%.
Human Pose Estimation Based on Improved High-Resolution Network
To achieve more accurate positioning of human body key points,a human pose estimation model and algorithm are introduced based on a high-resolution detection network(HRNet)with a waterfall shaped cavity spatial convolution module and Transformer.Firstly,a waterfall like hollow space convolution module is constructed to replace the fourth stage of HRNet,reducing the problem of large parameter quantities caused by the fusion of features at different scales and extracting multi-scale features more efficiently;Then,a Transformer based on self attention mechanism is introduced to process the extracted high-level features,and feature enhancement is achieved by capturing the non local interaction relationships of key points in the global space to obtain global information.The experiment shows that when the input im-age resolution is 256×192,the proposed model improves AP by 2.4%and 2.3%respectively compared to the HRNet-W32 and HRNet-W48 baseline models with a decrease in parameter count.

human pose estimationhigh-resolution networkwaterfall dilated convolutionattention mechanismmulti-scale

刘洁、陈志、岳文静

展开 >

南京邮电大学 计算机学院

南京邮电大学 通信与信息工程学院,江苏 南京 210003

人体姿态估计 高分辨率网络 瀑布式空洞卷积 注意力机制 多尺度

2024

软件导刊
湖北省信息学会

软件导刊

影响因子:0.524
ISSN:1672-7800
年,卷(期):2024.23(6)