基于多头注意力的场景文本图像超分辨率网络

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据

中文摘要：场景文本图像超分辨率(STISR)技术旨在提高低分辨率文本图像中的分辨率和可读性,是下游文本识别任务的基础性工作.利用深度卷积神经网络完成STISR的现有方法缺乏对文本图像全局信息的考虑,导致恢复结果不稳定,在处理视觉相似的低分辨率文本图像时尤其明显.针对上述问题,提出一种新的场景文本图像超分辨率网络(MASRN),它包含一个文本先验(TP)模块和一个混合骨干网络.TP模块首先通过提取文本图像的语义特征来生成文本先验信息,接着由卷积模块和多头注意力融合模块组成的混合骨干网络将文本先验信息与多尺度图像特征融合.在TextZoom数据集上的实验结果表明,所提的MASRN能够恢复出更高质量的文本图像,有效提升了下游文本识别任务的精度.

外文标题：Scene Text Image Super-Resolution Network Based on Multi-head Attention

外文摘要：Scene text image super-resolution(STISR)aims to enhance the resolution and readability of low-resolution text images,which serves as a foundational step for downstream text recognition tasks.Existing STISR methods based on deep convolutional neural networks often lack consideration of the global information of text images,leading to unstable restoration results,espe-cially for visually similar low-resolution text images.To address this problem,a novel STISR net-work is proposed,which includes a text prior(TP)module and a hybrid backbone network.The TP module generates text prior information by extracting semantic features from text images,while the hybrid backbone network fuses the text prior information with multi-scale image features,consis-ting of convolutional modules and multi-head attention fusion modules.Experimental results on the TextZoom dataset show that our proposed MASRN can restore higher quality text images and effec-tively improve the recognition accuracy in downstream text recognition tasks.

外文关键词：

scene textimage super-resolutiontext recognitiontext priorconvolutional neural networksmulti-head attention

作者：

朱佳楠、邢树礼

展开 >

作者单位：

福建理工大学福建省大数据挖掘与应用重点实验室,福建福州 350118

关键词：

场景文本图像超分辨率文本识别文本先验卷积网络多头注意力

出版年：

2025

DOI：

10.12046/j.issn.1000-5277.2023110029

福建师范大学学报(自然科学版)

福建师范大学

福建师范大学学报(自然科学版)

北大核心

影响因子：0.353

ISSN：1000-5277

年,卷(期)：2025.41(1)