注意力引导的三流卷积神经网络用于微表情识别

扫码查看

原文链接

NETL
NSTL
万方数据

中文摘要：目的微表情识别在心理咨询、置信测谎和意图分析等多个领域都有着重要的应用价值.然而,由于微表情自身具有动作幅度小、持续时间短的特点,到目前为止,微表情的识别性能仍然有很大的提升空间.为了进一步推动微表情识别的发展,提出了一种注意力引导的三流卷积神经网络(attention-guided three-stream convolutional neural network,ATSCNN)用于微表情识别.方法首先,对所有微表情序列的起始帧和峰值帧进行预处理;然后,利用TV-L1(total variation-L1)能量泛函提取微表情两帧之间的光流;接下来,在特征提取阶段,为了克服有限样本量带来的过拟合问题,通过3个相同的浅层卷积神经网络分别提取输入3个光流值的特征,再引入卷积块注意力模块以聚焦重要信息并抑制不相关信息,提高微表情的识别性能;最后,将提取到的特征送入全连接层分类.此外,整个模型架构采用SELU(scaled exponential linear unit)激活函数以加快收敛速度.结果本文在微表情组合数据集上进行 LOSO(leave-one-subject-out)交叉验证,未加权平均召回率(unweighted average recall,UAR)以及未加权 F1-Score(unweighted F1-score,UF1)分别达到了 0.735 1和0.720 5.与对比方法中性能最优的Dual-Inception模型相比,UAR和UF1分别提高了 0.060 7和0.068 3.实验结果证实了本文方法的可行性.结论本文方法所提出的微表情识别网络,在有效缓解过拟合的同时,也能在小规模的微表情数据集上达到先进的识别效果.

外文标题：Attention-guided three-stream convolutional neural network for microexpression recognition

外文摘要：Objective In recent years,microexpression recognition has remarkable application value in various fields such as psychological counseling,lie detection,and intention analysis.However,unlike macro-expressions generated in con-scious states,microexpressions often occur in high-risk scenarios and are generated in an unconscious state.They are char-acterized by small action amplitudes,short duration,and usually affect local facial areas.These features also determine the difficulty of microexpression recognition.Traditional methods used in early research mainly include methods based on local binary patterns and methods based on optical flow.The former can effectively extract the texture features of microex-pressions,whereas the latter calculates the pixel changes in the temporal domain and the relationship between adjacent frames,providing rich,key input information for the network.Although the traditional methods based on texture features and optical flow features have made good progress in early microexpression recognition,they often require considerable cost and have room for improvement in recognition accuracy and robustness.Later,with the development of machine learning,microexpression recognition based on deep learning gradually became the mainstream of research in this field.This method uses neural networks to extract features from input image sequences after a series of preprocessing operations(facial crop-ping and alignment and grayscale processing)and classifies them to obtain the final recognition result.The introduction of deep learning has substantially improved the recognition performance of microexpressions.However,given the characteris-tics of microexpressions themselves,the recognition accuracy of microexpressions can still be improved considerably,while the limited scale of existing microexpression datasets also restricts the recognition effect of such emotional behaviors.To solve these problems,this paper proposes an attention-guided three-stream convolutional neural network(ATSCNN)for microexpression recognition.Method First,considering that the motion changes between adjacent frames of microexpres-sions are very subtle,to reduce redundant information and computation in the image sequence while preserving the impor-tant features of microexpressions,this paper only performs preprocessing operations such as facial alignment and cropping on the two key frames of microexpressions(onset frame and apex frame)to obtain a single-channel grayscale image sequence with a resolution of 128 × 128 pixels and to reduce the influence of nonfacial areas on microexpression recogni-tion.Then,because optical flow can capture representative motion features between two frames of microexpressions,it can obtain a higher signal-to-noise ratio than the original data and provide rich,critical input features for the network.There-fore,this paper uses the total variation-L1(TV-L1)energy functional to extract optical flow features between two frames of microexpressions(the horizontal component of optical flow,the vertical component of optical flow,and the optical strain).Next,in the microexpression feature extraction stage,to overcome the overfitting problem caused by limited sample size,three identical four-layer convolutional neural networks are used to extract the features of the input optical flow horizontal component,optical flow vertical component,and optical strain,(the input channel numbers of the four convolutional lay-ers are 1,3,5,and 8,and the output channel numbers are 3,5,8,and 16),thus improving the network performance.Afterward,because the image sequences in the microexpression dataset used in this paper inevitably contain some redun-dant information other than the face,a convolutional block attention module(CBAM)with channel attention and spatial attention serially connected is added after each shallow convolutional neural network in each stream to focus on the impor-tant information of the input and suppress irrelevant information,while paying attention to both the channel dimension and the spatial dimension,thereby enhancing the network's ability to obtain effective features and improving the recognition performance of microexpressions.Finally,the extracted features are fed into a fully connected layer to achieve microexpres-sion emotion classification(including negative,positive,and surprise).In addition,the entire model architecture uses the scaled exponential linear unit(SELU)activation function to overcome the potential problems of neuron death and gradient disappearance in the commonly used rectified linear unit(ReLU)activation function to speed up the convergence speed of the neural network.Result This paper conducted experiments on the microexpression combination dataset using the leave-one-subject-out(LOSO)cross-validation strategy.In this strategy,each subject serves as the test set,and all remaining samples are used for training.This validation method can fully utilize the samples and has a certain generalization ability.This method is the most commonly used validation in current microexpression recognition research.The results of this paper's experiments on the unweighted average recall(UAR)and unweighted Fl-score(UF1)reached 0.735 1 and 0.720 5,respectively.Compared with the Dual-Inception model,which performed best in the comparative methods,UAR and UF1 increased by 0.060 7 and 0.068 3,respectively.To verify further the effectiveness of the ATSCNN neural network archi-tecture proposed in this paper,several ablation experiments were also conducted on the combined dataset,and the results confirmed the feasibility of this paper's method.Conclusion The microexpression recognition network proposed in this paper can effectively alleviate overfitting,focus on important information of microexpressions,and achieve state-of-the-art(SOTA)recognition performance on small-scale microexpression datasets using LOSO cross-validation.Compared with other mainstream models,the proposed method achieved state-of-the-art recognition performance.In addition,the results of several ablation experiments made the proposed method more convincing.In conclusion,the proposed method remark-ably improved the effectiveness of microexpression recognition.

外文关键词：

microexpression recognitionoptical flowthree-stream convolution neural networkconvolutional block attention module(CBAM)SELU activation function

作者：

赵明华、董爽爽、胡静、都双丽、石程、李鹏、石争浩

展开 >

作者单位：

西安理工大学计算机科学与工程学院,西安 710048

陕西省网络计算与安全技术重点实验室,西安 710048

关键词：

微表情识别光流三流卷积神经网络卷积块注意力模块(CBAM) SELU激活函数

基金：

国家重点研发计划国家自然科学基金项目国家自然科学基金项目陕西省自然科学基金项目陕西省自然科学基金项目陕西省自然科学基金项目陕西省教育厅重点实验室基金项目陕西省教育厅重点实验室基金项目

项目编号：

2017YFB1402103-361901363619013622020JQ-6482019JM-3812019JQ-72920JS08620JS087

出版年：

2024

DOI：

10.11834/jig.230053

中国图象图形学报

中国科学院遥感应用研究所,中国图象图形学学会 ,北京应用物理与计算数学研究所

中国图象图形学报

CSTPCD北大核心

影响因子：1.111

ISSN：1006-8961

年,卷(期)：2024.29(1)

参考文献量37