基于CRNN改进的中文街景文本识别技术

扫码查看

原文链接

万方数据
维普

中文摘要：现实场景中存在图像扭曲、背景复杂、弯曲倾斜等不规则文字形状,提取其中的文字信息可提高图像的语义信息和帮助分析上下文,从而更好地理解场景图像.针对场景文本的复杂问题,提出基于CRNN(卷积循环神经网络)改进的端到端场景文本识别技术.在卷积网络层提取特征,基于GoogLeNet改进的inception结构,加入多分支卷积层对多尺度特征的融合,其次融入注意力机制,在通道维度和空间维度加强特征联系,使局部特征拥有全局性.在循环网络层采用Bi-LSTM(双向长短期记忆网络)加强字符之间的上下文联系进行序列预测,最后将预测序列传入CTC(时序分类层)进行转录后序列输出.在IIIT5K数据集和百度中文街景数据集上的实验结果表明,该方法分别获得了95.3%和91.1%的准确率,证明其可靠性.

外文标题：Improved Chinese Street View Text Recognition Technology based on CRNN

外文摘要：In real-world scenarios,there are complexities such as image distortion,background clutter,bending,and tilting that can cause irregular text shapes.Extracting textual information from these images can enhance their semantic content and help analyze the context,thus better-facilitating understanding of the scene.To address these challenges in scene text recognition,an end-to-end text recognition technique based on CRNN(Convolutional Recurrent Neural Net-work)is proposed.In the convolutional network layer,an improved inception structure based on GoogLeNet is used to extract features.This structure incorporates multi-branch convolutional layers for the fusion of multi-scale features.Ad-ditionally,an attention mechanism is incorporated to enhance feature correlation in both the channel and spatial dimen-sions,giving local features a global perspective.In the recurrent network layer,Bi-LSTM(Bidirectional Long Short-Term Memory)is employed to strengthen the contextual relationships between characters for sequential prediction.Final-ly,the predicted sequence is fed into CTC(Connectionist Temporal Classification)for post-transcription sequence out-put.Experimental results on the IIIT5K dataset and Baidu's Chinese Street View dataset demonstrate the reliability of this approach,with accuracy rates of 95.3%and 91.1%respectively.

外文关键词：

text recognitionconvolutional neural networkattention mechanismbi-directional long and short-term memory

作者：

任锐、王晓娅、文成玉

展开 >

作者单位：

成都信息工程大学通信工程学院,四川成都 610225

关键词：

文本识别卷积神经网络注意力机制双向长短期记忆

出版年：

2025

DOI：