首页|Self-supervised keypoint detection based on affine transformation
Self-supervised keypoint detection based on affine transformation
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
Elsevier
Self-supervised learning has emerged as a powerful approach to reducing the cost associated with data labeling for network training. Nonetheless, a key challenge in self-supervised keypoint detection is ensuring that the detected keypoints carry human-interpretable semantic meaning. This paper addresses this challenge by introducing a novel self-supervised keypoint detection algorithm designed to generate semantically meaningful human keypoints while maintaining detection accuracy. The proposed approach reformulates human keypoint detection as a problem of affine transformation of predefined keypoint templates, distinguishing itself from existing self-supervised techniques. Specifically, a semantically annotated human keypoint template is predefined, and an affine transformation matrix is learned based on extracted human pose features. By applying this matrix to the template, the algorithm generates keypoints that are not only accurate but also semantically aligned with the corresponding human poses. Furthermore, a margin loss is introduced to stabilize the affine transformations across various image scales, ensuring robust performance. Experimental evaluations on the Human3.6M and Deepfashion datasets demonstrate that the algorithm achieves an average detection error of 2.78 on Human3.6M, only a marginal increase of 0.02 compared to the baseline method, Autolink. On the Deepfashion dataset, the algorithm achieves a keypoint detection accuracy of 65%, which is 1% below Autolink. Importantly, unlike other methods, the proposed algorithm guarantees that all generated keypoints are semantically interpretable, providing a significant advantage in human-centered applications.