A Survey on Interpretability of Facial Expression Recognition
In recent years,Facial Expression Recognition(FER)has been widely used in medicine,social robotics,communication,security and many other fields.A growing number of researchers are showing interest in the FER area and have proposed useful algorithms.At the same time,the study of FER interpretability has attracted increasing attention from researchers,as it can deepen their understanding of the models and ensure fairness,privacy preservation,and robustness.In this paper,we summarized the interpretability works in the field of FER based on the classification of result interpretability,mechanism interpretability,and model interpretability.Result interpretability indicates the extent to which people with specific experience can consistently understand the results of the models.Specifically,result interpretable FER mainly includes methods based on text description and the basic structure of the face.Wherein the methods based on face structure consists of approaches based on facial action units(AU),topological modeling,caricature images and interference analysis.In addition,mechanism interpretability focuses on explanation of the internal mechanism of the models,including the attention mechanism in FER,as well as the interpretability methods based on feature decoupling and concept learning.As for model interpretability,researchers often try to find out the decision principle or rules of the models.This paper illustrates the interpretable classification methods in FER,which belong to model interpretability.Such approaches involve those based on Multi-Kernel Support Vector Machine(MKSVM)and those based on decision trees and deep forest.Additionally,we compared and analyzed the FER interpretability works.We also identified current problems in this area,including the lack of evaluation metrics for FER interpretability analysis,the challenge of balancing the accuracy and interpretability of FER models,and the limited interpretability data available for expression recognition.Afterwards,a discussion and outlook on the way forward took place.First is about the interpretability of complex expressions recognition,mainly focusing on the compound expressions and more delicate fine-grained expressions.Then it comes to the interpretability of multi-modal emotion recognition.Multi-modal models can obtain better performance by complementing the information of each modality,and their interpretability analysis is also an important direction worth exploring in the future.Additionally,we believe that interpretability of expression and emotion recognition with large models is another significant future direction,including interpretability of Large Vision Models,Vision Language Models and Multi-modal Large Models.Interpretability study can help to improve the safety and reliability of large models.Finally,we address the enhancement of generalization ability based on interpretability.When the models are learning"relevance"rather than"causality",they are easy to make wrong judgments when encountering new data or being affected by other factors,that is,the models do not have good generalization performance.The interpretability analysis helps deepen our understanding of the nature of the models,explain the causal relationship between input and output,and therefore improve the generalization performance.This paper intends to provide interested researchers with a comprehensive review and analysis of the current state of research on the interpretability of facial expression recognition,thereby promoting further advancements in this field.