Micro-Expression Detection Method Based on Multi-Scale Spatiotemporal Attention Network
Micro-expressions can reveal genuine emotions that people attempt to hide,providing potential information for criminal investigations,psychological counseling,and other situations.Existing methods for detecting micro-expression primarily extract temporal characteristics to construct spatiotemporal features based on obtaining spatial features;however,these approaches can result in distorted temporal features,and thus disrupt the original temporal relationships during spatial processing,consequently diminishing the discriminative ability of the spatiotemporal features of micro-expressions.To address this issue,a method is proposed for micro-expression detection based on a multi-scale spatiotemporal attention network.Using a 3-Dimensional Convolutional Neural Network(3DCNN)that incorporates temporal and spatial relationships,the micro-expression sequences are processed to obtain robust features considering both the temporal and spatial domains.Multi-scale temporal input sequences are constructed to extract multi-dimensional temporal features from image sequences with different time lengths in the network.A lightweight 3DCNN is used to extract multi-scale spatiotemporal features.The Global Spatiotemporal Attention Module(GSAM)is employed to enhance the overall spatiotemporal correlations of features,wherein the spatiotemporal restructuring module strengthens the connectivity between different image frames at different moments,whereas the global information attention module constructs the spatial correlation information on a single-frame image.Finally,the assignment of weights to various temporal characteristics highlights the key temporal information,effectively detecting micro-expressions.The experimental results demonstrate that the proposed method can accurately detect micro-expression sequence fragments,achieving accuracy rates of 92.32%,95.04%,and 89.56%on the publicly available CASME,CASME Ⅱ,and SAMM datasets,respectively.Compared with that of the existing optimal deep learning method,LGAttNet,the accuracy of the proposed method is improved by 3.84 percentage points on the CASME Ⅱ dataset and 4.96 percentage points on the SAMM dataset.