基于双重注意力和多尺度特征融合的场景文本检测算法

Scene text detection based on dual attention and multi-scale feature fusion

强观臣 ¹杨茜 ¹张丽真 ¹熊炜 ²李利荣³

扫码查看

作者信息

1. 湖北工业大学电气与电子工程学院,湖北武汉 430068
2. 湖北工业大学电气与电子工程学院,湖北武汉 430068;湖北工业大学太阳能高效利用及储能运行控制湖北省重点实验室,湖北武汉 430068;湖北工业大学新能源及电网装备安全监测湖北省工程研究中心,湖北武汉 430068;美国南卡罗来纳大学计算机科学与工程系,南卡罗来纳州29201
3. 湖北工业大学电气与电子工程学院,湖北武汉 430068;湖北工业大学太阳能高效利用及储能运行控制湖北省重点实验室,湖北武汉 430068
折叠

摘要

本文提出了一种场景文本检测方法,用于应对复杂自然场景中文本检测的挑战.该方法采用了双重注意力和多尺度特征融合的策略,通过双重注意力融合机制增强了文本特征通道之间的关联性,提升了整体检测性能.在考虑到深层特征图上下采样可能引发的语义信息损失的基础上,提出了空洞卷积多尺度特征融合金字塔(dilated convolution multi-scale feature fusion pyramid structure,MFPN),它采用双融合机制来增强语义特征,有助于加强语义特征,克服尺度变化的影响.针对不同密度信息融合引发的语义冲突和多尺度特征表达受限问题,创新性地引入了多尺度特征融合模块(multi-scale feature fusion module,MFFM).此外,针对容易被冲突信息掩盖的小文本问题,引入了特征细化模块(feature refinement module,FRM).实验表明,本文的方法对复杂场景中文本检测有效,其F值在CTW 1500、ICDAR 2015和Total-Text 3个数据集上分别达到了85.6％、87.1％和 86.3％.

Abstract

Addressing the challenges associated with text detection in complex natural scenes,this paper presents a novel scene text detection method that employs a dual-attention and multi-scale feature fusion strategy.By introducing the dual-attention fusion mechanism,the correlation between text feature chan-nels is strengthened,leading to an overall improvement in detection performance.Furthermore,consider-ing the potential loss of semantic information resulting from up-and-down sampling of deep feature maps,a hollow convolutional multi-scale feature fusion pyramid is introduced.This approach adopts a du-al fusion mechanism to enhance semantic features and overcome the impact of scale variations.To ad-dress the issues of semantic conflict and limited representation of multi-scale features resulting from the fusion of information with different densities,an innovative multi-scale feature fusion module(MFFM)is introduced.In addition,the feature refinement module(FRM)is introduced for the problem of small text that is easily masked by conflicting information.The experiments show the effectiveness of our method for text detection in complex scenes with F-values of 85.6％,87.1％and 86.3％on three datasets,CTW1500,ICDAR2015,and Total-Text.

关键词

文本检测/注意力融合/多尺度/特征融合金字塔

Key words

text detection/attention fusion/multi-scale/feature fusion pyramid

引用本文复制引用

出版年

2024

光电子·激光

天津理工大学中国光学学会

光电子·激光

CSCD北大核心

影响因子：1.437

ISSN：1005-0086

段落导航