首页|PTH-Net: Dynamic Facial Expression Recognition Without Face Detection and Alignment

PTH-Net: Dynamic Facial Expression Recognition Without Face Detection and Alignment

扫码查看
Pyramid Temporal Hierarchy Network (PTH-Net) is a new paradigm for dynamic facial expression recognition, applied directly to raw videos, without face detection and alignment. Unlike the traditional paradigm, which focus only on facial areas and often overlooks valuable information like body movements, PTH-Net preserves more critical information. It does this by distinguishing between backgrounds and human bodies at the feature level, offering greater flexibility as an end-to-end network. Specifically, PTH-Net utilizes a pre-trained backbone to extract multiple general features of video understanding at various temporal frequencies, forming a temporal feature pyramid. It then further expands this temporal hierarchy through differentiated parameter sharing and downsampling, ultimately refining emotional information under the supervision of expression temporal-frequency invariance. Additionally, PTH-Net features an efficient Scalable Semantic Distinction layer that enhances feature discrimination, helping to better identify target expressions versus non-target ones in the video. Finally, extensive experiments demonstrate that PTH-Net performs excellently in eight challenging benchmarks, with lower computational costs compared to previous methods. The source code is available at https://github.com/lm495455/PTH-Net.

Feature extractionVideosFace recognitionSemanticsEmotion recognitionMarket researchTransformersVisualizationHeadData mining

Min Li、Xiaoqin Zhang、Tangfei Liao、Sheng Lin、Guobao Xiao

展开 >

Key Laboratory of Intelligent Informatics for Safety and Emergency of Zhejiang Province, Wenzhou University, Wenzhou, China

2025

IEEE transactions on image processing

IEEE transactions on image processing

ISSN:
年,卷(期):2025.34(1)
  • 63