首页|Robust page object detection network for heterogeneous document images

Robust page object detection network for heterogeneous document images

扫码查看
Document Layout Analysis (DLA) has emerged as a challenging problem in the field of computer vision. The primary goal of DLA involves the identification of page objects including tables, figures, images, and equations from document images. In this paper, we propose a Lightweight and Robust Page Object Detection Network (LR-PODNet) for page object detection (POD) from heterogeneous document images. The proposed network improves the object detection capabilities of the YOLOv5 model by integrating the two components: Convolutional Global Attention Block (C3-AB) and Hybrid Dilated Atrous spatial pyramid pooling Block (HDAB) for POD. The C3-AB is an enhanced version of the C3 module of YOLOv5 which incorporates a global attention block instead of bottleneck-CSP block. It enhances the capability of the model to capture global dimensional features and suppresses the redundant content. The output from C3-AB is passed to the HDAB for extraction of both local and contextual features. The HDAB is strategically incorporated instead of SPPF within the YOLOv5 architecture to enhance multiple feature extraction capabilities. The experimental results show that the proposed LR-PODNet outperforms the existing methods by achieving the mAP@0.5:0.95 of 77.5% and 76.2% on the IIIT-AR-13K and NCERT5K-IITRPR datasets, respectively. Additionally, we have also evaluated the robustness of the proposed model on these two datasets by varying the IoU threshold.

Hybrid dilationsAttention mechanismPage object detectionIntra-class variationInter-class similarityDocument layout analysis

Hadia Showkat Kawoosa、Muhammad Suhaib Kanroo、Kapil Rana、Puneet Goyal

展开 >

CSE, Indian Institute of Technology Ropar, Rupnagar 140001, Punjab, India

CSE, Indian Institute of Technology Ropar, Rupnagar 140001, Punjab, India||CSE, Thapar Institute of Engineering and Technology, Patiala 147004, Punjab, India

CSE, Indian Institute of Technology Ropar, Rupnagar 140001, Punjab, India||NIET, NIMS University, Jaipur 303121, Rajasthan, India

2025

International journal on document analysis and recognition: IJDAR
  • 64