首页|二维人体姿态编解码方法综述:从解决歧义性问题的角度出发

二维人体姿态编解码方法综述:从解决歧义性问题的角度出发

扫码查看
人体姿态估计在娱乐、健康、安全等领域为众多应用提供了关键技术支持.人体姿态编解码的目的在于从原始输入数据中提取特征,将其构建为更易处理和理解的中间表示形式,并从中恢复出可理解的人体姿态.然而,实际场景中受到光照、运动模糊、遮挡、复杂姿态、拍摄视角和图像分辨率等因素的影响,人体姿态估计常常受到分布歧义、尺度歧义和关联歧义等问题的困扰.因此,合理的编解码设计是解决人体姿态估计各类歧义性问题的关键.首先,对人体姿态建模方法进行介绍,其是实现人体姿态编解码的前提条件.然后,针对分布歧义问题,从基于分布约束、基于结构约束和基于迭代约束3个方面进行介绍;尺度歧义问题被划分为关键点尺度歧义和像素尺度歧义问题,并介绍与之相关的基于尺度表征、基于无偏变换和基于积分回归的方法;针对关联歧义问题,归纳包括基于图优化、基于肢体向量、基于实例中心和基于参考标签的4类人体姿态编解码方法.同时,对各方法的性能进行了总结分析.最后,对未来人体姿态编解码的研究方向进行了展望.
Review of 2D human pose encoding and decoding methods:from the perspective of ambiguity mitigation
Within the various subfields of computer vision,human pose estimation stands out as an interesting area of research.This estimation aims to precisely localize body parts or keypoints of the human instance from a given image or video and reconstruct the skeleton structure of the human body.Human pose estimation offers technical support for various applications,such as human pose tracking,human action recognition,person re-identification,human-object interac-tions,and person image generation.The uses of human pose estimation span across entertainment(such as virtual reality,augmented reality,and animation),health(such as healthcare and sports),and security(such as surveillance).Conse-quently,high-performance and real-time human pose estimation have emerged as prominent focus areas in current com-puter vision research.Extensive research on human pose estimation methods has been conducted in recent years.A part of the research focuses on developing and refining high-performance or lightweight network architectures.Notable examples include Hourglass,SimpleBaseline,high resolution net(HRNet),and Lite-HRNet.These architectures have found broad utility in various visual tasks,including object detection and instance segmentation.Another facet of research is dedicated to introducing innovative pose encoding and decoding schemes.These novel schemes are intended to construct accurate and robust human pose estimation models.The encoding and decoding processes for human pose estimation represent a piv-otal stage in extracting features from the input data and translating this information into comprehensible human poses.The encoding process primarily involves extracting features from the initial input data and molding them into an intermediate representation.This intermediate form,which could be feature maps or latent vectors,simplifies processing and compre-hension;the subsequent decoding process retrieves the ultimate human pose from this encoded structure.Despite the con-siderable progress made in current research on human pose estimation,the issue of ambiguity remains a major obstacle in real-world scenarios.Diverse poses might be mapped to similar or overlapping low-dimensional representations,primarily due to variables such as illumination,motion blur,occlusions,complex poses,perspective,and resolution.This approach leads to ambiguous and uncertain resultant poses,constituting the ambiguity challenge in human pose estimation.This challenge encompasses distributive,scale,and associative ambiguity.For example,in scenarios where a hand is obscured,the precise location of the wrist becomes uncertain,thus yielding distributive ambiguity.Second,the scale of the body in the image diminishes when the camera is positioned farther from the human instance,often making it difficult to ascertain the accurate scale without ample contextual details,leading to scale ambiguity.Third,precisely assigning the identified keypoints to corresponding human instances becomes intricate when two human instances obscure each other,thereby introducing associative ambiguity.The well-designed methods for encoding and decoding human poses enable the suitable modeling and solving of human pose estimation.These methods provide effective optimization objectives and fea-ture representations for the model,allowing for the construction of highly reasonable and robust human pose estimation mod-els.Therefore,investigating encoding and decoding for human pose estimation carries substantial importance for research.The majority of past review papers on human pose estimation have primarily focused on the design of network structures,while the ambiguity problem can markedly influence the performance of human pose estimation.The objective is to provide a summarized analysis of the current research on pose encoding and decoding methods.This analysis will encompass a thor-ough investigation of the inherent ambiguity challenge associated with human pose estimation.In this paper,human pose modeling techniques are first introduced,which directly impact the potential for expressive human pose representation.Second,the pose encoding and decoding methods are categorized into distributive,scale,and associative ambiguity.Three strategies are explored to address distributive ambiguity:distributive,structural,and iterative constraints.The scale ambiguity is further refined into the keypoint-and pixel-wise scale ambiguity problem.The former is mainly addressed through representative-based methods,and the latter can be solved using unbiased and integral-based methods.Possible approaches for associative ambiguity can be categorized into the following four groups:graph-,limb-,center-,and embedding-based methods.These diverse methods provide multiple potential solutions for dealing with associative ambigu-ity.A summary and performance comparison of the methods used for encoding and decoding human poses are provided to help understand the strengths and limitations of each approach.Finally,potential directions for future development are elu-cidated.This paper aims to establish a novel research trajectory for researchers:addressing the ambiguity problem in human pose estimation through encoding and decoding.The resolution of ambiguity challenges in human pose estimation is expected to broaden its potential applications.

deep learninghuman pose estimationambiguity problemhuman pose encoding and decodinghuman pose modeling

喻莉、杜聪炬、闫增强、赵慧娟、何双江

展开 >

华中科技大学电子信息与通信学院,武汉 430074

深度学习 人体姿态估计 歧义性问题 人体姿态编解码 人体姿态建模

2024

中国图象图形学报
中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心
影响因子:1.111
ISSN:1006-8961
年,卷(期):2024.29(11)