成都信息工程大学学报2025,Vol.40Issue(1) :1-6.DOI:10.16836/j.cnki.jcuit.2025.01.001

基于CRNN改进的中文街景文本识别技术

Improved Chinese Street View Text Recognition Technology based on CRNN

任锐 王晓娅 文成玉
成都信息工程大学学报2025,Vol.40Issue(1) :1-6.DOI:10.16836/j.cnki.jcuit.2025.01.001

基于CRNN改进的中文街景文本识别技术

Improved Chinese Street View Text Recognition Technology based on CRNN

任锐 1王晓娅 1文成玉1
扫码查看

作者信息

  • 1. 成都信息工程大学通信工程学院,四川 成都 610225
  • 折叠

摘要

现实场景中存在图像扭曲、背景复杂、弯曲倾斜等不规则文字形状,提取其中的文字信息可提高图像的语义信息和帮助分析上下文,从而更好地理解场景图像.针对场景文本的复杂问题,提出基于CRNN(卷积循环神经网络)改进的端到端场景文本识别技术.在卷积网络层提取特征,基于GoogLeNet改进的inception结构,加入多分支卷积层对多尺度特征的融合,其次融入注意力机制,在通道维度和空间维度加强特征联系,使局部特征拥有全局性.在循环网络层采用Bi-LSTM(双向长短期记忆网络)加强字符之间的上下文联系进行序列预测,最后将预测序列传入CTC(时序分类层)进行转录后序列输出.在IIIT5K数据集和百度中文街景数据集上的实验结果表明,该方法分别获得了95.3%和91.1%的准确率,证明其可靠性.

Abstract

In real-world scenarios,there are complexities such as image distortion,background clutter,bending,and tilting that can cause irregular text shapes.Extracting textual information from these images can enhance their semantic content and help analyze the context,thus better-facilitating understanding of the scene.To address these challenges in scene text recognition,an end-to-end text recognition technique based on CRNN(Convolutional Recurrent Neural Net-work)is proposed.In the convolutional network layer,an improved inception structure based on GoogLeNet is used to extract features.This structure incorporates multi-branch convolutional layers for the fusion of multi-scale features.Ad-ditionally,an attention mechanism is incorporated to enhance feature correlation in both the channel and spatial dimen-sions,giving local features a global perspective.In the recurrent network layer,Bi-LSTM(Bidirectional Long Short-Term Memory)is employed to strengthen the contextual relationships between characters for sequential prediction.Final-ly,the predicted sequence is fed into CTC(Connectionist Temporal Classification)for post-transcription sequence out-put.Experimental results on the IIIT5K dataset and Baidu's Chinese Street View dataset demonstrate the reliability of this approach,with accuracy rates of 95.3%and 91.1%respectively.

关键词

文本识别/卷积神经网络/注意力机制/双向长短期记忆

Key words

text recognition/convolutional neural network/attention mechanism/bi-directional long and short-term memory

引用本文复制引用

出版年

2025
成都信息工程大学学报
成都信息工程学院

成都信息工程大学学报

影响因子:0.329
ISSN:2096-1618
段落导航相关论文