基于混合Transformer模型的三维视线估计

3D gaze estimation based on hybrid Transformer model

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
维普
万方数据

中文摘要：针对当前在无约束环境中,进行视线估计任务时准确度不高的问题,提出了一种基于混合Transformer模型的视线估计方法.首先,对MobileNet V3网络进行改进,增加了坐标注意力机制,提高MobileNet V3网络特征提取的有效性;再利用改进的MobileNet V3网络从人脸图像中提取视线估计特征;其次,对Transformer模型的前向反馈神经网络层进行改进,加入一个卷积核大小为3×3的深度卷积层,来提高全局特征整合能力;最后,将提取到的特征输入到改进后的Transformer模型进行整合处理,输出三维视线估计方向.在MPIIFaceGaze数据集上进行评估,该方法的视线估计角度平均误差为3.56°,表明该模型能够较为准确地进行三维视线估计.

外文摘要：Aiming at the issue of low accuracy in gaze estimation tasks in unconstrained environments,a gaze estimation method based on hybrid Transformer model is proposed.First,the MobileNet V3 network is improved by adding a coordinate attention module to enhance the effectiveness of feature extraction.Then,the improved MobileNet V3 network is used to extract gaze estimation features from facial images.Subsequently,the forward feed neural network layer of the Transformer model is enhanced by incorporating a 3×3 depthwise convolution layer to escalate the overall feature integration capability.Finally,the extracted features are inputted into the improved Transformer model for integrated processing,and the 3D gaze estimation direction is outputted.The method is evaluated on the MPIIFaceGaze dataset,and the average error of gaze estimation angle is 3.56°,indicating that the model can accurately perform 3D gaze estimation.

外文关键词：

3D gaze estimationcoordinate attentiondepthwise convolution

作者：

童立靖、王清河、冯金芝

展开 >

作者单位：

北方工业大学信息学院,北京 100144

关键词：

三维视线估计坐标注意力深度卷积

基金：

北京市自然科学基金资助项目北京市属高校"青年拔尖人才"培养计划资助项目

项目编号：

4194076CIT&TCD201904009

出版年：

2024

DOI：

10.20056/j.cnki.ZNMDZK.20240113

中南民族大学学报(自然科学版)

中南民族大学

中南民族大学学报(自然科学版)

影响因子：0.536

ISSN：1672-4321

年,卷(期)：2024.43(1)

参考文献量1