Facial expression recognition in Swin Transformer by embedding hybrid attention mechanism
Facial expression recognition is an important research domain in psychology that can be applied to many fields such as transportation,medical care,security,and criminal investigation.Given the limitations of convolutional neural networks(CNN)in extracting global features of facial expressions,this paper proposes a Swin Transformer method embedded with a hybrid attention mechanism for facial expression recognition.Using the Swin Transformer as the backbone network,a hybrid attention module is embedded in the fusion layer(Patch Merging)in the model of Stage3,which can effectively extract global and local features from facial ex-pressions.Firstly,the hierarchical Swin Transformer model can effectively obtain deep global features.Second-ly,the embedded hybrid attention module combines channel and spatial attention mechanisms to extract fea-tures in the channel dimension and spatial dimension,which can attain better local features.At the same time,this article uses the transfer learning method to initialize the model network weights,thereby improving the recognition performance and generalization ability.The proposed method achieved recognition accuracies of 73.63%,87.01%,and 98.28%on three public datasets(FER2013,RAF-DB,and JAFFE)respectively,achieving good recognition results.