基于视觉语言的文字识别方法综述

A Review of Visual Language Based Text Recognition Methods

陈曦 ¹陆利坤 ¹王彤 ¹曾庆涛¹

扫码查看

作者信息

1. 北京印刷学院,北京 102600
折叠

摘要

从光学字符识别(OCR)的基础到自然语言处理在文字识别中的应用,再到视觉语言模型在文字识别领域的最新进展,详细介绍了文字识别的各个步骤,包括图像预处理、特征提取、字符分割和识别,并讨论了多种先进技术和模型如对比学习、多模态融合,以及其他视觉语言模型结合的文字识别方法.此外,还比较了不同方法在多个数据集上的性能,并讨论了文字识别领域面临的挑战和限制.

Abstract

From the foundation of Optical Character Recognition(OCR)to the application of natural language processing in text recognition,to the latest progress of visual language models in the field of text recognition.This paper introduces in detail the various steps of text recognition,including image preprocessing,feature extraction,character segmentation and recognition,and discusses a variety of advanced technologies and models such as contrast-based Xi,multimodal fusion,and other visual language models.In addition,the performance of different methods on multiple datasets is compared,and the challenges and limitations in the field of text recognition are discussed.

关键词

光学字符识别/自然语言处理/对比学习/多模态融合/视觉语言模型

Key words

optical character recognition/natural language processing/comparative learning/multimodal fusion/visual language modeling

引用本文复制引用

基金项目

北京市教委出版学新兴交叉学科平台建设-数字喷墨印刷技术及多功能轮转胶印机关键技术研发平台(04190123001/003)

北京市数字教育研究重点课题(BDEC2022619027)

北京市高教学会立项面上项目(2023)(MS2023168)

北京印刷学院校级科研项目(20190122019)

北京印刷学院校级科研项目(Ec202303)

北京印刷学院校级科研项目(Ea202301)

北京印刷学院校级科研项目(E6202405)

北京印刷学院学科建设和研究生教育专项(2109012201221090323009)

北京市自然科学基金(1212010)

出版年

2024

北京印刷学院学报

北京印刷学院

北京印刷学院学报

影响因子：0.247

ISSN：1004-8626

段落导航