基于多头注意力的场景文本图像超分辨率网络
Scene Text Image Super-Resolution Network Based on Multi-head Attention
朱佳楠 1邢树礼1
作者信息
- 1. 福建理工大学福建省大数据挖掘与应用重点实验室,福建 福州 350118
- 折叠
摘要
场景文本图像超分辨率(STISR)技术旨在提高低分辨率文本图像中的分辨率和可读性,是下游文本识别任务的基础性工作.利用深度卷积神经网络完成STISR的现有方法缺乏对文本图像全局信息的考虑,导致恢复结果不稳定,在处理视觉相似的低分辨率文本图像时尤其明显.针对上述问题,提出一种新的场景文本图像超分辨率网络(MASRN),它包含一个文本先验(TP)模块和一个混合骨干网络.TP模块首先通过提取文本图像的语义特征来生成文本先验信息,接着由卷积模块和多头注意力融合模块组成的混合骨干网络将文本先验信息与多尺度图像特征融合.在TextZoom数据集上的实验结果表明,所提的MASRN能够恢复出更高质量的文本图像,有效提升了下游文本识别任务的精度.
Abstract
Scene text image super-resolution(STISR)aims to enhance the resolution and readability of low-resolution text images,which serves as a foundational step for downstream text recognition tasks.Existing STISR methods based on deep convolutional neural networks often lack consideration of the global information of text images,leading to unstable restoration results,espe-cially for visually similar low-resolution text images.To address this problem,a novel STISR net-work is proposed,which includes a text prior(TP)module and a hybrid backbone network.The TP module generates text prior information by extracting semantic features from text images,while the hybrid backbone network fuses the text prior information with multi-scale image features,consis-ting of convolutional modules and multi-head attention fusion modules.Experimental results on the TextZoom dataset show that our proposed MASRN can restore higher quality text images and effec-tively improve the recognition accuracy in downstream text recognition tasks.
关键词
场景文本/图像超分辨率/文本识别/文本先验/卷积网络/多头注意力Key words
scene text/image super-resolution/text recognition/text prior/convolutional neural networks/multi-head attention引用本文复制引用
出版年
2025