首页|Unpaired document image denoising for OCR using BiLSTM enhanced CycleGAN

Unpaired document image denoising for OCR using BiLSTM enhanced CycleGAN

扫码查看
The recognition performance of optical character recognition (OCR) models can be sub-optimal when document images suffer from various degradations. Supervised deep learning methods for image enhancement can generate high-quality enhanced images. However, these methods demand the availability of corresponding clean images or ground truth text. Sometimes this requirement is difficult to fulfill for real-world noisy documents. For instance, it can be challenging to create paired noisy/clean training datasets or obtain ground truth text for noisy point-of-sale receipts and invoices. Unsupervised methods have been explored in recent years to enhance images in the absence of ground truth images or text. However, these methods focus on enhancing natural scene images. In the case of document images, preserving the readability of text in the enhanced images is of utmost importance for improved OCR performance. In this work, we propose a modified architecture to the CycleGAN model to improve its performance in enhancing document images with better text preservation. Inspired by the success of CNN-BiLSTM combination networks in text recognition models, we propose modifying the discriminator network in the CycleGAN model to a combined CNN-BiLSTM network for better feature extraction from document images during classification by the discriminator network. The results demonstrate that the proposed model significantly enhances text preservation and OCR performance compared to the standard CycleGAN discriminator network. Specifically, when assessing the Tesseract engine's word accuracy on real-world noisy receipt images from the POS dataset, the proposed model achieved an improvement of up to 61.66% over the original CycleGAN model and 23.32% over the original noisy receipt images. Additionally, the proposed model consistently outperformed other unsupervised classical techniques across all OCR engines evaluated.

Document enhancementUnsupervised learningImage processingGenerative adversarial networks

Katyani Singh、Ganesh Tata、Eric Van Oeveren、Nilanjan Ray

展开 >

Department of Computing Science, University of Alberta, 116 St & 85 Ave, Edmonton T6G 2R3, Alberta, Canada

Intuit Inc., 2700 Coast Ave, Mountain View 94043, California, USA

2025

International journal on document analysis and recognition: IJDAR
  • 72