首页|Clip-aware expressive feature learning for video-based facial expression recognition
Clip-aware expressive feature learning for video-based facial expression recognition
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
Elsevier
Video-based facial expression recognition (FER) has received increased attention as a result of its widespread applications. However, a video often contains many redundant and irrel-evant frames. How to reduce redundancy and complexity of the available information and extract the most relevant information to facial expression in video sequences is a challeng-ing task. In this paper, we divide a video into several short clips for processing and propose a clip-aware emotion-rich feature learning network (CEFLNet) for robust video-based FER. Our proposed CEFLNet identifies the emotional intensity expressed in each short clip in a video and obtains clip-aware emotion-rich representations. Specifically, CEFLNet con-structs a clip-based feature encoder (CFE) with two-cascaded self-attention and local-glo-bal relation learning, aiming to encode clip-based spatio-temporal features from the clips of a video. An emotional intensity activation network (EIAN) is devised to generate emo-tional activation maps for locating the salient emotion clips and obtaining clip-aware emotion-rich representations, which are used for expression classification. The effective-ness and robustness of the proposed CEFLNet are evaluated using four public facial expres-sion video datasets, including BU-3DFE, MMI, AFEW, and DFEW. Extensive experiments demonstrate the improved performance of our proposed CEFLNet in comparison with the state-of-the-art methods. (c) 2022 Elsevier Inc. All rights reserved.
Video -based FEREmotional activation mapClip -based feature encoderClip-aware emotion-rich representationNETWORKATTENTION