A Multi-scale Feature Fusion Method for Hand-object Interaction Detection Based on Complex Background
A two-stage multi-scale feature fusion method is proposed to address the challenges posed by complex backgrounds such as background noise and lighting variations,as well as issues like occlusion and low resolution in hand-object interaction detection.In the first stage,the Resnet50 residual network is introduced based on the feature pyramid as the backbone network for feature extraction,achieving multi-scale fusion of deep semantic information and shallow detail features,and improving the accuracy of small object detection.Subsequently,the geometric infor-mation between the detected hand region and object region is utilized to determine the occurrence of interaction,thereby filtering out non-interacting objects.Finally,extensive experiments are conducted on a large-scale dataset comprising human interaction video frames from both indoor and outdoor environments,involving 11 object catego-ries that are commonly touched by hands.The experimental results show that compared with other methods of the same type,the proposed method improves detection accuracy without increasing the complexity of the network mod-el.At the same time,the detection accuracy of different categories in the dataset is relatively stable,effectively impro-ving the generalization performance of the network.