中英文场景文本图像的检测和识别算法

Detection and Recognition Algorithms for Chinese and English Scene Text Images

王艳媛 ¹茅正冲¹

扫码查看

作者信息

1. 江南大学物联网工程学院,江苏无锡 214122
折叠

摘要

场景文本图像的背景复杂,检测算法难以定位文本区域,导致识别难度较高.为了同时检测和识别中文和英文的场景文本图像内容,并提高其检测和识别的准确率,提出一种基于ABCNetv2网络改进的算法模型TD-ABCNetv2.针对文本的形状、排列和字体等特征存在差异性的问题,该模型以SKNet作为骨干网络,引入选择性核函数SK模块,帮助网络学习不同尺度的特征,适应不同尺度、形状和方向的文本.考虑到中英文场景文本的字符大小和间隔不同,在FPN结构中增加ECA注意力模块,更有效地整合通道信息,增强网络对不同特征的敏感性,使得特征融合更有针对性.同时引入CIoU损失函数,更准确地衡量边界框之间的重叠程度,适应文本形状的变化,增强模型的泛化能力.通过在多个公开数据集上进行实验,结果表明了本文模型的有效性.

Abstract

The complex background of scene text images makes it challenging for detection algorithms to locate text regions accu-rately,leading to difficulties in recognition.To simultaneously detect and recognize scene text content in both Chinese and Eng-lish languages,and improve the accuracy of detection and recognition,an improved algorithmic model TD-ABCNetv2 based on ABCNetv2 network is proposed.Addressing the issue of variations in text features such as shape,arrangement,and font,this model adopts SKNet as the backbone network and introduces the Selective Kernel module to help the network learn features of dif-ferent scales,accommodating texts of various scales,shapes,and orientations.Considering the different character sizes and in-tervals of Chinese and English scene texts,the ECA attention module is added to the FPN structure to integrate the channel infor-mation more effectively,enhance the network's sensitivity to different features,and make the feature fusion more targeted.Addi-tionally,the CIoU loss function is introduced to more accurately measure the degree of overlap between bounding boxes,adapt to changes in the shape of the text,and enhance the generalization ability of the model.The experimental results show the proposed model is validated through experiments on several public datasets.

关键词

场景文本/中文文本检测/SKNet/注意力机制/交并比

Key words

scene text/Chinese text detection/SKNet/attention mechanism/IoU

引用本文复制引用

出版年

2024

计算机与现代化

江西省计算机学会江西省计算技术研究所

计算机与现代化

CSTPCD

影响因子：0.472

ISSN：1006-2475

段落导航