Scene Text Image Super-Resolution Network Based on Multi-head Attention
Scene text image super-resolution(STISR)aims to enhance the resolution and readability of low-resolution text images,which serves as a foundational step for downstream text recognition tasks.Existing STISR methods based on deep convolutional neural networks often lack consideration of the global information of text images,leading to unstable restoration results,espe-cially for visually similar low-resolution text images.To address this problem,a novel STISR net-work is proposed,which includes a text prior(TP)module and a hybrid backbone network.The TP module generates text prior information by extracting semantic features from text images,while the hybrid backbone network fuses the text prior information with multi-scale image features,consis-ting of convolutional modules and multi-head attention fusion modules.Experimental results on the TextZoom dataset show that our proposed MASRN can restore higher quality text images and effec-tively improve the recognition accuracy in downstream text recognition tasks.
scene textimage super-resolutiontext recognitiontext priorconvolutional neural networksmulti-head attention