With the introduction of Generative Adversarial Networks(GANs),techniques for the automatic synthesis of realis-tic images from text have been initially implemented.However,most existing tasks are limited to generating simple images from an-notations(captions),such as paintings and birds.As a subfield of text-generated images(T2I),text-generated face images(T2F)have great potential for applications in public safety,such as face reconstruction of criminal suspects.However,due to the current task datasets being either very small or not including annotations,there is a lack of available relevant datasets and little relevant re-search has been done on this area.In this paper,the algorithm proposed in Text2FaceGAN converts the attribute list of the CelebA dataset into a set of annotations and slices the face images in the CelebA dataset to generate the paired dataset ImprovedCelebA with<annotation-face image>,which addresses the deficiency of the dataset.In addition,as the effectiveness of text-to-face(T2F)image generation depends on the quality of text encoding,traditional T2I methods using coarse-grained text encoding means cannot generate realistic face images,so this paper proposes a method to generate faces based on fine-grained face text descriptions and uses BERT-DCGAN with GAN-CLS loss to learn this conditional multimodal problem.In order to avoid the problem of gradient disappearance during training with conventional GANs in the pre-training,the labels of the true and false images are flipped every four training sessions.Through experimental validation,the algorithm not only generates realistic face images compared to other text-generated face image methods,but also greatly reduces the training time.