Scene Text Detection Model Based on Pixel Aggregation
Aiming at numerous challenges faced in natural scene text detection,such as significant varia-tions in text shapes and multiple interferences in complex scenes,a pixel-aggregation-based natural scene text detection model is proposed. Firstly,a feature fusion module with nested concatenation of upsampling and long-short skip connections is designed to enhance the network's feature extraction capability by fusing multi-scale,multi-stage features extracted from the ResNet18 residual network. Secondly,inspired by cluste-ring,a pixel aggregation constraint is introduced to minimize the distance between peripheral pixels and the text's central area,enabling the description of text in arbitrary shapes within complex natural scenes. Final-ly,a lightweight text detection head is employed to achieve pixel-level character segmentation,thus impro-ving the model's efficiency. The proposed model is validated on the ICDAR2015,CTW1500,and a con-structed industrial character dataset. Results demonstrate that the model is capable of handling text detection tasks in complex natural environments,and outperforms existing state-of-the-art text detectors in terms of detection accuracy and efficiency.