Clip-aware expressive feature learning for video-based facial expression recognition

扫码查看

原文链接

NSTL
Elsevier

外文摘要：Video-based facial expression recognition (FER) has received increased attention as a result of its widespread applications. However, a video often contains many redundant and irrel-evant frames. How to reduce redundancy and complexity of the available information and extract the most relevant information to facial expression in video sequences is a challeng-ing task. In this paper, we divide a video into several short clips for processing and propose a clip-aware emotion-rich feature learning network (CEFLNet) for robust video-based FER. Our proposed CEFLNet identifies the emotional intensity expressed in each short clip in a video and obtains clip-aware emotion-rich representations. Specifically, CEFLNet con-structs a clip-based feature encoder (CFE) with two-cascaded self-attention and local-glo-bal relation learning, aiming to encode clip-based spatio-temporal features from the clips of a video. An emotional intensity activation network (EIAN) is devised to generate emo-tional activation maps for locating the salient emotion clips and obtaining clip-aware emotion-rich representations, which are used for expression classification. The effective-ness and robustness of the proposed CEFLNet are evaluated using four public facial expres-sion video datasets, including BU-3DFE, MMI, AFEW, and DFEW. Extensive experiments demonstrate the improved performance of our proposed CEFLNet in comparison with the state-of-the-art methods. (c) 2022 Elsevier Inc. All rights reserved.

外文关键词：

Video -based FEREmotional activation mapClip -based feature encoderClip-aware emotion-rich representationNETWORKATTENTION

作者：

Liu, Yuanyuan、Feng, Chuanxu、Yuan, Xiaohui、Zhou, Lin、Wang, Wenbin、Qin, Jie、Luo, Zhongwen

展开 >

作者单位：

China Univ Geosci

Univ North Texas

Wuhan Huazhong Numer Control Co Ltd

出版年：

2022

DOI：

10.1016/j.ins.2022.03.062

Information Sciences

EISCI

ISSN：0020-0255

年,卷(期)：2022.598

被引量9
参考文献量49