基于跨模态互译渲染模型的预训练视觉翻译技术

Pre-trained Visual Translation Technology Based on Cross-modal Translation Rendering Model

屈梦楠 ¹靳宇浩 ¹胡勃宁¹

扫码查看

作者信息

1. 河北科技大学信息科学与工程学院,河北石家庄 050018
折叠

摘要

如何在保证风格不变的情况下将图片中的外文替换为中文是一个有趣并富有挑战的问题.为此,针对图像中文本的跨语言转换提出一种预训练视觉翻译技术,结合文字检测、字体识别、OCR、图像修复、机器翻译及图像渲染技术构建跨模态自适应互译渲染模型,以保持原文风格和排版样式.首先使用EAST算法定位并提取文字区域;其次采用ResNet识别字体样式,CTC-OCR提取文字内容并由GPT模型进行翻译;最后由LaMa算法修复文字区域后,采用区域坐标渲染算法将翻译文字融入修复图像,实现高质量视觉翻译.由评估员对翻译效果进行定量评估,该方法主观评估分数达到7.90,具有较高准确性.

Abstract

How to replace foreign language in images with Chinese while maintaining the same style is an interesting and challenging problem.To this end,a pre trained visual translation technique is proposed for cross language conversion of text in images to maintain the original text style and layout style.Build a cross modal adaptive translation rendering model by combining text detection,font recognition,OCR,image res-toration,machine translation,and image rendering technologies.Firstly,use EAST algorithm to locate and extract text regions;Then,ResNet is used to recognize font styles,while CTC-OCR extracts text content and translates it into GPT;Finally,after repairing the text area using the LaMa algorithm,the region coordinate rendering algorithm is used to integrate the translated text into the repaired image,achieving high-qual-ity visual translation.The method of quantitatively evaluating translation effectiveness by evaluators has a subjective evaluation score of 7.90,indicating high accuracy.

关键词

视觉翻译/多模态/GPT/中文翻译/神经网络

Key words

visual translation/multi-modal/GPT/Chinese translation/neural network

引用本文复制引用

出版年

2024

软件导刊

湖北省信息学会

软件导刊

影响因子：0.524

ISSN：1672-7800

段落导航