首页|文本生成人脸:基于BERT-DCGAN的文本人脸生成

文本生成人脸:基于BERT-DCGAN的文本人脸生成

扫码查看
随着生成对抗网络(GANs)的提出,从文本中自动合成逼真的图像的技术已经初步实现。然而,现有的大多数任务仅限于从标注(captions)中生成简单的图像,如画、鸟。作为文本生成图像(T2I)的子领域,文本生成人脸图像(T2F)在公共安全领域有着巨大的应用潜力,如犯罪嫌疑人人脸重建。但是由于当前的任务数据集要么非常小,要么不包括标注,缺少可用相关的数据集,几乎没有对这个领域有相关的研究。论文通过Text2FaceGAN中所提出的算法将CelebA数据集的属性列表转换为一组标注,并对CelebA数据集中的人脸图像进行切割,生成<标注-人脸图像>的成对的数据集ImprovedCele-bA,解决了数据集的不足。此外,由于文本生成人脸图像(T2F)的效果依赖于文本编码的质量,传统的T2I方法使用粗粒度的文本编码手段无法生成逼真的人脸图像,因此论文提出一种根据细粒度的人脸文本描述生成人脸的方法,并利用带有GAN-CLS损失的BERT-DCGAN来学习这一条件多模态问题。为了避免前期训练时出现传统GANs训练时出现梯度消失的问题,每4次训练对真假图像的标签进行了翻转。通过实验验证,与其他文本生成人脸图像方法相比,该算法不仅可以生成逼真的人脸图像,而且大大减少了训练时间。
Text2Face:Text2Face Synthesis Based on BERT-DCGAN
With the introduction of Generative Adversarial Networks(GANs),techniques for the automatic synthesis of realis-tic images from text have been initially implemented.However,most existing tasks are limited to generating simple images from an-notations(captions),such as paintings and birds.As a subfield of text-generated images(T2I),text-generated face images(T2F)have great potential for applications in public safety,such as face reconstruction of criminal suspects.However,due to the current task datasets being either very small or not including annotations,there is a lack of available relevant datasets and little relevant re-search has been done on this area.In this paper,the algorithm proposed in Text2FaceGAN converts the attribute list of the CelebA dataset into a set of annotations and slices the face images in the CelebA dataset to generate the paired dataset ImprovedCelebA with<annotation-face image>,which addresses the deficiency of the dataset.In addition,as the effectiveness of text-to-face(T2F)image generation depends on the quality of text encoding,traditional T2I methods using coarse-grained text encoding means cannot generate realistic face images,so this paper proposes a method to generate faces based on fine-grained face text descriptions and uses BERT-DCGAN with GAN-CLS loss to learn this conditional multimodal problem.In order to avoid the problem of gradient disappearance during training with conventional GANs in the pre-training,the labels of the true and false images are flipped every four training sessions.Through experimental validation,the algorithm not only generates realistic face images compared to other text-generated face image methods,but also greatly reduces the training time.

generative adversarial networkstext-to-facemulti modal

余松森、陈新、苏海

展开 >

华南师范大学软件学院 广州 510631

生成对抗网络 文本生成人脸 多模态

2024

计算机与数字工程
中国船舶重工集团公司第七0九研究所

计算机与数字工程

CSTPCD
影响因子:0.355
ISSN:1672-9722
年,卷(期):2024.52(3)
  • 22