提出一种基于面部动作单元(Action Unit,AU)和Transformer的人脸表情生成方法,旨在提高生成人脸表情的质量.利用面部AU去引导表情的生成,将Transformer结构引入生成对抗网络(Generative Adversarial Network,GAN)的生成器中,利用自注意力机制关注人脸输入序列的不同部分,有利于底层特征的提取,同时提升了可解释性;在判别器中加入快速空间金字塔池化(Spatial Pyramid Pooling-Fast,SPPF)结构提取不同感受野的特征,对特征进行多尺度融合,增强判别器的性能.在公开数据集CelebA上训练模型,实验结果表明,所提算法提高了峰值信噪比(Peak Signal to Noise Ratio,PSNR)和生成图像质量(Fréchet Inception Distance,FID)指数,生成表情较为真实和准确.所提算法可以得到高质量的目标表情人脸图片,增强了表情编辑能力.
Facial Expression Generation Method Based on Facial Action Units and Transformer
A facial expression generation method based on facial action units and Transformer is proposed to improve the quality of generated facial expressions.Facial action units are utilized to guide the generation of expressions and the Transformer structure is introduced into the generator of the Generative Adversarial Network(GAN).The self-attention mechanism is utilized to focus on different parts of the facial input sequence,which facilitates the extraction of the underlying features and improves the interpretability.The Spatial Pyramid Pooling-Fast(SPPF)is added to the discriminator to extract features from different receptive fields,and multi-scale feature fusion is performed to enhance the performance of the discriminator.The model is trained on the public dataset CelebA,and the experimental results show that the algorithm improves the Peak Signal to Noise Ratio(PSNR)and the generated image quality Fréchet Inception Distance(FID)index,and generates more realistic and accurate expressions.This algorithm can get high quality target expression face pictures and enhance the expression editing capability.