From the foundation of Optical Character Recognition(OCR)to the application of natural language processing in text recognition,to the latest progress of visual language models in the field of text recognition.This paper introduces in detail the various steps of text recognition,including image preprocessing,feature extraction,character segmentation and recognition,and discusses a variety of advanced technologies and models such as contrast-based Xi,multimodal fusion,and other visual language models.In addition,the performance of different methods on multiple datasets is compared,and the challenges and limitations in the field of text recognition are discussed.
关键词
光学字符识别/自然语言处理/对比学习/多模态融合/视觉语言模型
Key words
optical character recognition/natural language processing/comparative learning/multimodal fusion/visual language modeling