Through the study of scene text detection and recognition in Uyghur languages,it is found that manual acquisition of labeled natural scene text images is time-consuming and labor-intensive.Therefore,artificially synthesized data is used as the main source of training data.To obtain more realistic data,a scenes text modification network for Uyghur based on generative ad-versarial network is proposed.The efficient Transformer module is used to construct the network for fully extracting the global and local features of the image to complete the modification of the Uyghur,and a fine-tuning module is added to fine-tune the final results.The model is trained with WGAN thought strategy,which can effectively cope with the problems of pattern collapse as well as gradient explosion.The generalization ability and robustness of the model are verified by text modification experiments in English-English and English-Virginia.Good results are achieved in both objective metrics(SSIM,PSNR)and visual effects,and are validated on real scene datasets SVT and ICDAR 2013.
关键词
生成对抗网络/场景文字修改/维语场景文字图像/高效Transformer/WGAN
Key words
generative adversarial networks/scene text editing/Uyghur scene text image/efficient Transformer/WGAN