基于像素聚合的自然场景文本检测模型

Scene Text Detection Model Based on Pixel Aggregation

扫码查看

原文链接

维普
万方数据

中文摘要：针对自然场景文本检测面临的文本形状差异大、场景复杂干扰多等诸多挑战,提出了一种基于像素聚合的自然场景文本检测模型.首先,设计了上采样和长短跳跃的嵌套巢式连接的特征融合模块,通过融合残差网络ResNet18提取的多尺度、多阶段的特征,增强网络特征提取的能力;其次,基于聚类的思想,引入像素聚合约束外围像素与文本中心区域的距离,实现复杂自然场景下的任意形状文本描述;最后,通过轻量级文本检测头实现像素级的字符分割,提高模型的效率.在IC-DAR2015、CTW1500以及构建的工业字符数据集上对所提模型进行验证,结果表明该模型能胜任复杂自然环境下的文本检测任务,且在检测精度和检测效率上均优于现有先进文本检测器.

外文摘要：Aiming at numerous challenges faced in natural scene text detection,such as significant varia-tions in text shapes and multiple interferences in complex scenes,a pixel-aggregation-based natural scene text detection model is proposed. Firstly,a feature fusion module with nested concatenation of upsampling and long-short skip connections is designed to enhance the network's feature extraction capability by fusing multi-scale,multi-stage features extracted from the ResNet18 residual network. Secondly,inspired by cluste-ring,a pixel aggregation constraint is introduced to minimize the distance between peripheral pixels and the text's central area,enabling the description of text in arbitrary shapes within complex natural scenes. Final-ly,a lightweight text detection head is employed to achieve pixel-level character segmentation,thus impro-ving the model's efficiency. The proposed model is validated on the ICDAR2015,CTW1500,and a con-structed industrial character dataset. Results demonstrate that the model is capable of handling text detection tasks in complex natural environments,and outperforms existing state-of-the-art text detectors in terms of detection accuracy and efficiency.

外文关键词：

feature fusionpixel aggregationtext detectioncharacter segmentation

作者：

张华东、钟羽中、涂海燕、佃松宜

展开 >

作者单位：

四川大学电气工程学院,成都 610065

关键词：

特征融合像素聚合文本检测字符分割

出版年：

2024

DOI：

10.13462/j.cnki.mmtamt.2024.11.003

组合机床与自动化加工技术

大连组合机床研究所中国机械工程学会生产工程分会

组合机床与自动化加工技术

CSTPCD北大核心

影响因子：0.671

ISSN：1001-2265

年,卷(期)：2024.(11)