Character recognition for government affairs based on capsule network and language model
Character recognition is one of the important research contents in the field of computer vision,which lays the foundation for building intelligent government services.However,the uneven quality of government images and diverse font styles cause the low recognition accuracy.In order to solve above problems,a CNLM model combining capsule network and language model is proposed,and the character cutting is combined with capsule network.Firstly,the government image dataset is constructed as character recognition images and sentence samples of the language model for training in stages,in the first stage,the visual model is pre-trained by public character cut dataset,and the language model is pre-trained by sentence samples and existing structured data.In the second stage,the visual model and language model are jointly trained,the output results of them are selected and iterated to finally obtain the text sequence information contained in the images.The method is tested on both the government image dataset and GA-HWDB dataset,and its accuracy is improved by 2.12%and 2.69%compared with VisionLAN.
intelligent government affaircharacter recognitioncapsule networklanguage model