基于Tesseract_OCR的藏文手写乌金体研究
A Study on Tibetan Handwritten Ujinti Based on Tesseract_OCR
唐梦坤 1陈汝真 1陈柏霖 1贾裕民 1马柯研1
作者信息
- 1. 西藏大学信息科学技术学院,西藏拉萨 850000
- 折叠
摘要
目前,关于藏文手写乌金体识别的研究较为有限,因此研究藏文手写乌金体识别系统有一定意义.文章的主要研究内容包括利用Tesseract_OCR识别引擎来实现对藏文手写乌金体的自动识别以及相关字库的训练.同时,本文使用Matlab编程语言来高效处理数据集,并采用Python编程语言构建了一个藏文手写乌金体识别系统.实验结果表明,通过采用本系统的方法,显著提升了 Tes-seract_OCR对藏文手写乌金体的识别准确度以及字库的质量.这项研究不仅对于藏文手写体识别技术的发展具有重要意义,还为保护和传承乌金体文化遗产提供了有力的工具.未来的研究将继续探索基于Tesseract_OCR的藏文手写体识别方法,并进一步优化系统以提高识别性能.
Abstract
Currently,research on Tibetan handwritten Uchen script recognition is limited,mak-ing the study of a Tibetan handwritten Uchen script recognition system meaningful.This paper's primary research focuses on utilizing the Tesseract OCR recognition engine for automatic rec-ognition of Tibetan handwritten Uchen script and the training of relevant character libraries.Additionally,this paper efficiently processes datasets using the Matlab programming language and constructs a Tibetan handwritten Uchen script recognition system using the Python pro-gramming language.Experimental results demonstrate a significant improvement in the recogni-tion accuracy of Tibetan handwritten Uchen script as well as the quality of the character library when employing this system.This research not only holds significance for the advancement of Tibetan handwritten script recognition technology but also provides a powerful tool for the preservation and inheritance of the Uchen script cultural heritage.Future research will continue to explore Tibetan handwritten script recognition methods based on Tesseract OCR and further optimize the system for enhanced recognition performance.
关键词
藏文手写体识别/Tesseract_OCR/图像预处理/灰度二值化/形态学处理Key words
Tibetan handwritten character recognition/Tesseract_OCR/image preprocessing/grayscale binarization/morphological processing引用本文复制引用
基金项目
大学生创新创业训练计划(202310694035)
出版年
2024