首页|基于轻量型空间特征编码网络的驾驶人注视区域估计算法

基于轻量型空间特征编码网络的驾驶人注视区域估计算法

扫码查看
实时监测驾驶人注视区有助于人机共驾汽车理解并判断驾驶人的意图.针对车载环境下算法精度和实时性难以平衡的问题,提出了一种基于轻量型空间特征编码网络(lightweight spatial feature encoding network,LSFENet)的驾驶人注视区估计算法.通过人脸对齐和眼镜移除步骤对采集的驾驶人上半身图像序列进行预处理,得到左右眼图像和人脸关键点坐标;在MobileNetV2的基础上构建基于GCSbottleneck模块的LSFENet特征提取网络,集成注意力机制模块增强关键特征权重,生成左右两眼特征;利用Kronecker积融合眼部与人脸关键点特征,将连续帧图像融合后的特征输入循环神经网络中,得到该图像序列的注视区域估计结果;利用公开数据集和自制数据集对新算法进行测试.实验结果表明:LSFENet算法的注视区估计准确率可达97.08%,每秒能检测约103帧图像,满足车载环境下运算效率和精度需求;LSFENet算法对注视区1、2、3、4、9的估计准确率均在85%以上,且对不同光照条件和眼镜遮挡情况均具有较强的适应能力.研究结果对驾驶人视觉分心状态识别具有重要意义.
Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network
[Objective]The real-time monitoring of a driver's gaze region is essential for human-machine shared driving vehicles to understand and predict the driver's intentions.Because of the limited computational resources and storage capacity of in-vehicle platforms,existing gaze region estimation algorithms often hardly balance accuracy and real-time performance and ignore temporal information.[Methods]Therefore,this paper proposes a lightweight spatial feature encoding network(LSFENet)for driver gaze region estimation.First,the image sequence of the driver's upper body is captured by an RGB camera.Image preprocessing steps,including face alignment and glasses removal,are performed to obtain left-and right-eye images and facial keypoint coordinates to handle challenges such as cluttered backgrounds and facial occlusions in the captured images.Face alignment is conducted using the multi-task cascaded convolutional network algorithm,and the glasses are removed using the cycle-consistent adversarial network algorithm.Second,we build the LSFENet feature extraction network based on the GCSbottleneck module to improve the MobileNetV2 architecture,since the inverted residual structure in the MobileNetV2 network requires a significant amount of memory and floating-point operations and ignores the redundancy and the correlation among the feature maps.We embed a ghost module to improve memory consumption and integrate the channel and spatial attention modules to extract the cross-channel and spatial information from the feature map.Next,the Kronecker product is used to fuse eye features with facial keypoint features to reduce the impact of the information complexity imbalance.Then,the fused features from the images at continuous frames are input into a recurrent neural network to estimate the gaze zone of the image sequence.Finally,the proposed network is evaluated using the public driver gaze in the wild(DGW)dataset and a self-collected dataset.The evaluation metrics include the number of parameters,the floating-point operations per second(FLOPs),the frames per second(FPS),and the F1 score.[Results]The experimental results showed the following:(1)The gaze region estimation accuracy of the proposed algorithm was 97.08%,which was approximately 7%higher than that of the original MobileNetV2.Additionally,both the number of parameters and FLOPs were reduced by 22.5%,and the FPS was improved by 36.43%.The proposed network had a frame rate of approximately 103 FPS and satisfied the computational efficiency and accuracy requirements under in-vehicle environments.(2)The estimation accuracies of the gaze regions 1,2,3,4,and 9 were over 85%for the proposed algorithm.The macro-average and micro-average precisions of the DGW dataset reached 74.32%and 76.01%,respectively.(3)The proposed algorithm provided high classification accuracy for fine-grained eye images with small intra-class differences.(4)The visualization results of the class activation mapping demonstrated that the proposed algorithm had strong adaptability to various lighting conditions and glass occlusion situations.[Conclusions]The research results are of great significance for the recognition of a driver's visual distraction states.

gaze zone estimationlightweight spatial feature encoding networkattention mechanismfeature extractionKronecker's productrecurrent neural network

张名芳、李桂林、吴初娜、王力、佟良昊

展开 >

北方工业大学城市道路交通智能控制技术北京市重点实验室,北京 100144

交通运输部公路科学研究院,运输车辆运行安全技术交通运输行业重点实验室,北京 100088

注视区域估计 轻量型空间特征编码网络 注意力机制 特征提取 Kronecker积 循环神经网络

国家自然科学基金北京市教委科学研究计划

51905007KM202210009013

2024

清华大学学报(自然科学版)
清华大学

清华大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.586
ISSN:1000-0054
年,卷(期):2024.64(1)
  • 5