Aiming at the issue of low accuracy in gaze estimation tasks in unconstrained environments,a gaze estimation method based on hybrid Transformer model is proposed.First,the MobileNet V3 network is improved by adding a coordinate attention module to enhance the effectiveness of feature extraction.Then,the improved MobileNet V3 network is used to extract gaze estimation features from facial images.Subsequently,the forward feed neural network layer of the Transformer model is enhanced by incorporating a 3×3 depthwise convolution layer to escalate the overall feature integration capability.Finally,the extracted features are inputted into the improved Transformer model for integrated processing,and the 3D gaze estimation direction is outputted.The method is evaluated on the MPIIFaceGaze dataset,and the average error of gaze estimation angle is 3.56°,indicating that the model can accurately perform 3D gaze estimation.
3D gaze estimationcoordinate attentiondepthwise convolution