A Dynamic Head Gesture Recognition Method that Fuses Attention Mechanism with 3D Two-Stream Convolution
Head gesture is a crucial human-computer interaction approach that usually conveys important emotional and intentional infor-mation.Most wearable device-based methods are expensive and inconvenient,although most of them have satisfactory accuracy.Mean-while,vision-based methods suffer from low accuracy,insufficient generalization,and enormous computational cost.Therefore,the current head recognition methods are still difficult to apply to mobile robots.A dynamic head gesture recognition method that fuses attention mechanism with 3D two-stream convolution(Fam-3DTSC)is proposed to deal with the above problems.Fam-3DTSC extracts RGB signal and optical flow features from videos with dynamic head gestures,and makes action feature extraction and strengthening from the channel and spatial domain,which is inspired by the attention mechanism.After the critical features have been extracted effectively and accurately,the features are fused and classified.The experimental results show that the proposed method can extract the essential chan-nel domain and spatial domain information of head gestures,improve the accuracy and generalization ability of head gesture recognition.The proposed method can also achieve high accuracy and real-time performance with limited computing resources.Later,it is applied to the elderly assistance robot and validated in a practical demonstration application.The results show that the proposed method is suitable for mobile on-board computing platforms with limited computing resources,such as mobile robots.
mobile robotshuman-robot interactionattention mechanismdynamic head gestureaction recognition