Scene Text Detection Based on Multi-scale Feature Extraction and Bidirectional Feature Fusion
连哲 1殷雁君 1智敏 1徐巧枝1
扫码查看
点击上方二维码区域,可以放大扫码查看
作者信息
1. 内蒙古师范大学 计算机科学技术学院,呼和浩特 010022
折叠
摘要
自然场景文本检测是图像处理领域的基础性研究工作,具有广泛的应用价值.目前,自然场景文本检测通常采用单尺度卷积和多尺度特征融合来捕获场景文本语义特征.然而,单尺度卷积方法通常难以兼顾不同形状、不同尺度的文本目标的特征表达.同时,基于上采样的简单的多尺度特征融合方法,只关注了尺度大小的一致性,而忽略了不同尺度下特征的重要性.针对以上问题,提出一种基于多尺度特征提取和双向特征融合的场景文本检测算法.所提算法基于不同大小卷积核构建多尺度特征提取模块,以兼顾不同尺度和不同形状文本目标的特征提取,同时捕获不同距离上下文信息依赖关系.在特征融合过程中,通过增加自下而上的融合路径构建双向特征融合模块实现不同尺度信息交互.特征融合后引入坐标注意力,以实现高层细节信息增强,弥补特征融合细节信息损失的缺陷.在 ICDAR2015、MSRA-TD500、CTW1500 数据集上进行大量实验,实验结果 F 值分别达到87.8%、87.1%和83.2%,检测速度分别达到 17.2 帧/s、31.1 帧/s和 22.3 帧/s,相较于其他先进检测方法展现出良好的鲁棒性.
Abstract
Natural scene text detection is a fundamental research work in the field of image processing and has a wide range of applications.Currently,natural scene text detection usually adopts single-scale convolution and multi-scale feature fusion to capture the semantic features of scene text.However,single-scale convolution methods are usually difficult to take into account the feature representation of text targets with different shapes and scales.Meanwhile,simple multi-scale feature fusion methods based on upsampling only focus on the consistency of scale size,while ignoring the importance of features at different scales.To address the above problems,a scene text detection algorithm based on multi-scale feature extraction and bidirectional feature fusion is proposed.The proposed algorithm constructs a multi-scale feature extraction module based on convolutional kernels of different sizes to take into account the feature extraction of text targets of different scales and shapes,while capturing contextual information dependencies at different distances.In the feature fusion process,a bi-directional feature fusion module is constructed by adding bottom-up fusion paths to achieve different scales of information interaction.Coordinate attention is introduced after feature fusion to achieve high-level detail information enhancement and compensate for the deficiency of feature fusion detail information loss.Extensive experiments are conducted on the ICDAR2015,MSRA-TD500,and CTW1500 datasets,and the experimental F1 values reach 87.8%,87.1%,and 83.2%,respectively,with detection speeds of 17.2 frames/s,31.1 frames/s,and 22.3 frames/s,respectively,showing good robustness compared with other advanced detection methods.
关键词
文本检测/多尺度特征提取/双向特征融合/坐标注意力/可微分二值化
Key words
text detection/multi-scale feature extraction/bidirectional feature fusion/coordinate attention/differentiable binariza-tion