Attention-guided three-stream convolutional neural network for microexpression recognition
Objective In recent years,microexpression recognition has remarkable application value in various fields such as psychological counseling,lie detection,and intention analysis.However,unlike macro-expressions generated in con-scious states,microexpressions often occur in high-risk scenarios and are generated in an unconscious state.They are char-acterized by small action amplitudes,short duration,and usually affect local facial areas.These features also determine the difficulty of microexpression recognition.Traditional methods used in early research mainly include methods based on local binary patterns and methods based on optical flow.The former can effectively extract the texture features of microex-pressions,whereas the latter calculates the pixel changes in the temporal domain and the relationship between adjacent frames,providing rich,key input information for the network.Although the traditional methods based on texture features and optical flow features have made good progress in early microexpression recognition,they often require considerable cost and have room for improvement in recognition accuracy and robustness.Later,with the development of machine learning,microexpression recognition based on deep learning gradually became the mainstream of research in this field.This method uses neural networks to extract features from input image sequences after a series of preprocessing operations(facial crop-ping and alignment and grayscale processing)and classifies them to obtain the final recognition result.The introduction of deep learning has substantially improved the recognition performance of microexpressions.However,given the characteris-tics of microexpressions themselves,the recognition accuracy of microexpressions can still be improved considerably,while the limited scale of existing microexpression datasets also restricts the recognition effect of such emotional behaviors.To solve these problems,this paper proposes an attention-guided three-stream convolutional neural network(ATSCNN)for microexpression recognition.Method First,considering that the motion changes between adjacent frames of microexpres-sions are very subtle,to reduce redundant information and computation in the image sequence while preserving the impor-tant features of microexpressions,this paper only performs preprocessing operations such as facial alignment and cropping on the two key frames of microexpressions(onset frame and apex frame)to obtain a single-channel grayscale image sequence with a resolution of 128 × 128 pixels and to reduce the influence of nonfacial areas on microexpression recogni-tion.Then,because optical flow can capture representative motion features between two frames of microexpressions,it can obtain a higher signal-to-noise ratio than the original data and provide rich,critical input features for the network.There-fore,this paper uses the total variation-L1(TV-L1)energy functional to extract optical flow features between two frames of microexpressions(the horizontal component of optical flow,the vertical component of optical flow,and the optical strain).Next,in the microexpression feature extraction stage,to overcome the overfitting problem caused by limited sample size,three identical four-layer convolutional neural networks are used to extract the features of the input optical flow horizontal component,optical flow vertical component,and optical strain,(the input channel numbers of the four convolutional lay-ers are 1,3,5,and 8,and the output channel numbers are 3,5,8,and 16),thus improving the network performance.Afterward,because the image sequences in the microexpression dataset used in this paper inevitably contain some redun-dant information other than the face,a convolutional block attention module(CBAM)with channel attention and spatial attention serially connected is added after each shallow convolutional neural network in each stream to focus on the impor-tant information of the input and suppress irrelevant information,while paying attention to both the channel dimension and the spatial dimension,thereby enhancing the network's ability to obtain effective features and improving the recognition performance of microexpressions.Finally,the extracted features are fed into a fully connected layer to achieve microexpres-sion emotion classification(including negative,positive,and surprise).In addition,the entire model architecture uses the scaled exponential linear unit(SELU)activation function to overcome the potential problems of neuron death and gradient disappearance in the commonly used rectified linear unit(ReLU)activation function to speed up the convergence speed of the neural network.Result This paper conducted experiments on the microexpression combination dataset using the leave-one-subject-out(LOSO)cross-validation strategy.In this strategy,each subject serves as the test set,and all remaining samples are used for training.This validation method can fully utilize the samples and has a certain generalization ability.This method is the most commonly used validation in current microexpression recognition research.The results of this paper's experiments on the unweighted average recall(UAR)and unweighted Fl-score(UF1)reached 0.735 1 and 0.720 5,respectively.Compared with the Dual-Inception model,which performed best in the comparative methods,UAR and UF1 increased by 0.060 7 and 0.068 3,respectively.To verify further the effectiveness of the ATSCNN neural network archi-tecture proposed in this paper,several ablation experiments were also conducted on the combined dataset,and the results confirmed the feasibility of this paper's method.Conclusion The microexpression recognition network proposed in this paper can effectively alleviate overfitting,focus on important information of microexpressions,and achieve state-of-the-art(SOTA)recognition performance on small-scale microexpression datasets using LOSO cross-validation.Compared with other mainstream models,the proposed method achieved state-of-the-art recognition performance.In addition,the results of several ablation experiments made the proposed method more convincing.In conclusion,the proposed method remark-ably improved the effectiveness of microexpression recognition.