Improved Chinese Street View Text Recognition Technology based on CRNN
In real-world scenarios,there are complexities such as image distortion,background clutter,bending,and tilting that can cause irregular text shapes.Extracting textual information from these images can enhance their semantic content and help analyze the context,thus better-facilitating understanding of the scene.To address these challenges in scene text recognition,an end-to-end text recognition technique based on CRNN(Convolutional Recurrent Neural Net-work)is proposed.In the convolutional network layer,an improved inception structure based on GoogLeNet is used to extract features.This structure incorporates multi-branch convolutional layers for the fusion of multi-scale features.Ad-ditionally,an attention mechanism is incorporated to enhance feature correlation in both the channel and spatial dimen-sions,giving local features a global perspective.In the recurrent network layer,Bi-LSTM(Bidirectional Long Short-Term Memory)is employed to strengthen the contextual relationships between characters for sequential prediction.Final-ly,the predicted sequence is fed into CTC(Connectionist Temporal Classification)for post-transcription sequence out-put.Experimental results on the IIIT5K dataset and Baidu's Chinese Street View dataset demonstrate the reliability of this approach,with accuracy rates of 95.3%and 91.1%respectively.
text recognitionconvolutional neural networkattention mechanismbi-directional long and short-term memory