Multi-scale multi-level multi-task network based micro-expression analysis for long videos
Unlike macro-expressions,micro-expressions are typically characterized by short duration,little movement amplitude,and less coverage area.Micro-expressions are intertwined with macro-expressions in long videos,making the spotting and recognition of micro-expressions more difficult and heavily dependent on expert experience.To address the problem,this paper develops a multi-task model for long video micro-expression analysis.It adopts a cascaded network structure to accomplish the spotting subtask and the recognition subtask respectively.Given the micro-expressions only occur in localized areas of the face and have different distribution of features due to individual differences,resulting in inaccurate spotted or missed detection of key frames,the Dual-CBAM-Inception module is employed in the spotting sub-network.This enhances the spatial sensing field of the model,and extracts multi-scale optical flow features for global and local regions to enhance the robustness of the model.The uneven distribution of expression categories in long videos and the subtle facial movements when micro-expressions occur lead to low accuracy of micro-expression classification and recognition in long videos.A depth-separable DenseNet Model is proposed in the recognition sub-network.The model improves the accuracy of expression recognition by extracting shallow and deep semantic features of optical flow information at multiple levels while controlling the amount of computation and computational cost.Our proposed method is validated on CAS(ME)2long videos,as well as CASME Ⅱ and SMIC short video datasets.The results show it is able to spot micro-expression intervals and recognize expression categories for long videos.Moreover,it outperforms many current state-of-the-art methods.
micro-expression analysisoptical flowmulti-task modelmulti-scale featuresmulti-level features