Text-to-image synthesis method based on spatial attention and conditional augmentation
For the problems such as inconsistent semantics of text-to-images,unstable training,and single generated images,a text-to-images model based on spatial attention and conditional augmentation was proposed on a simple and effective text-to-images benchmark model.To improve the stability of the training process and increase the diversity of generated images,a conditional augmentation model was added on the basis of the original model;starting from the text distribution to fit the image distribution,increasing the diversity of visual features and expanding the performance space,and adding an Affine block in the original DF-Block module.A spatial attention model was added to the discriminator to improve the semantic consistency of the text and the synthetic image.The experimental results showed that on the CUB and Oxford-102 datasets,inception score increased by 2.05%and 2.63%respectively;and on the CUB and COCO datasets,Fréchrt inception distance decreased by 20.73%and 9.25%respectively.The results proved that the images generated by the proposed model were more diverse and closer to real images.
text-to-imagesDF-GANconditional augmentation modelAffine blockspatial attention model