Pre-trained Visual Translation Technology Based on Cross-modal Translation Rendering Model
How to replace foreign language in images with Chinese while maintaining the same style is an interesting and challenging problem.To this end,a pre trained visual translation technique is proposed for cross language conversion of text in images to maintain the original text style and layout style.Build a cross modal adaptive translation rendering model by combining text detection,font recognition,OCR,image res-toration,machine translation,and image rendering technologies.Firstly,use EAST algorithm to locate and extract text regions;Then,ResNet is used to recognize font styles,while CTC-OCR extracts text content and translates it into GPT;Finally,after repairing the text area using the LaMa algorithm,the region coordinate rendering algorithm is used to integrate the translated text into the repaired image,achieving high-qual-ity visual translation.The method of quantitatively evaluating translation effectiveness by evaluators has a subjective evaluation score of 7.90,indicating high accuracy.