首页|基于像素聚合的自然场景文本检测模型

基于像素聚合的自然场景文本检测模型

扫码查看
针对自然场景文本检测面临的文本形状差异大、场景复杂干扰多等诸多挑战,提出了一种基于像素聚合的自然场景文本检测模型.首先,设计了上采样和长短跳跃的嵌套巢式连接的特征融合模块,通过融合残差网络ResNet18提取的多尺度、多阶段的特征,增强网络特征提取的能力;其次,基于聚类的思想,引入像素聚合约束外围像素与文本中心区域的距离,实现复杂自然场景下的任意形状文本描述;最后,通过轻量级文本检测头实现像素级的字符分割,提高模型的效率.在IC-DAR2015、CTW1500以及构建的工业字符数据集上对所提模型进行验证,结果表明该模型能胜任复杂自然环境下的文本检测任务,且在检测精度和检测效率上均优于现有先进文本检测器.
Scene Text Detection Model Based on Pixel Aggregation
Aiming at numerous challenges faced in natural scene text detection,such as significant varia-tions in text shapes and multiple interferences in complex scenes,a pixel-aggregation-based natural scene text detection model is proposed. Firstly,a feature fusion module with nested concatenation of upsampling and long-short skip connections is designed to enhance the network's feature extraction capability by fusing multi-scale,multi-stage features extracted from the ResNet18 residual network. Secondly,inspired by cluste-ring,a pixel aggregation constraint is introduced to minimize the distance between peripheral pixels and the text's central area,enabling the description of text in arbitrary shapes within complex natural scenes. Final-ly,a lightweight text detection head is employed to achieve pixel-level character segmentation,thus impro-ving the model's efficiency. The proposed model is validated on the ICDAR2015,CTW1500,and a con-structed industrial character dataset. Results demonstrate that the model is capable of handling text detection tasks in complex natural environments,and outperforms existing state-of-the-art text detectors in terms of detection accuracy and efficiency.

feature fusionpixel aggregationtext detectioncharacter segmentation

张华东、钟羽中、涂海燕、佃松宜

展开 >

四川大学电气工程学院,成都 610065

特征融合 像素聚合 文本检测 字符分割

2024

组合机床与自动化加工技术
大连组合机床研究所 中国机械工程学会生产工程分会

组合机床与自动化加工技术

CSTPCD北大核心
影响因子:0.671
ISSN:1001-2265
年,卷(期):2024.(11)