首页|Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition

Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition

扫码查看
Despite recent advances, scene text recognition remains a challenging problem due to the significant variability, irregularity and distortion in text appearance and localization. Attention-based methods have become the mainstream due to their superior vocabulary learning and observation ability. Nonetheless, they are susceptible to attention drift which can lead to word recognition errors. Most works focus on correcting attention drift in decoding but completely ignore the error accumulated during the encoding process. In this paper, we propose a novel scheme, called the Attention Guidance by Cross-Domain Supervision Signals for Scene Text Recognition (ACDS-STR), which can mitigate the attention drift at the feature encoding stage. At the heart of the proposed scheme is the cross-domain attention guidance and feature encoding fusion module (CAFM) that uses the core areas of characters to recursively guide attention to learn in the encoding process. With precise attention information sourced from CAFM, we propose a non-attention-based adaptive transformation decoder (ATD) to guarantee decoding performance and improve decoding speed. In the training stage, we fuse manual guidance and subjective learning to learn the core areas of characters, which notably augments the recognition performance of the model. Experiments are conducted on public benchmarks and show the state-of-the-art performance. The source will be available at https://github.com/xuefanfu/ACDS-STR.

Feature extractionText recognitionDecodingEncodingImage segmentationImage codingTransformersIronElectronic mailComputer vision

Fanfu Xue、Jiande Sun、Yaqi Xue、Qiang Wu、Lei Zhu、Xiaojun Chang、Sen-Ching Cheung

展开 >

School of Information Science and Engineering, Shandong Normal University, Jinan, China

College of Intelligence and Information Engineering, Shandong University of Traditional Chinese Medicine, Jinan, China

School of Information Science and Engineering, Shandong University, Qingdao, China

School of Information Science and Engineering, Shandong Normal University, Jinan, China|School of Electronic and Information Engineering, Tongji University, Shanghai, China

Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, NSW, Australia

Department of Electrical and Computer Engineering, University of Kentucky, Lexington, KY, USA

展开 >

2025

IEEE transactions on image processing

IEEE transactions on image processing

ISSN:
年,卷(期):2025.34(1)
  • 56