首页|低质古籍文献图像预处理方法研究

低质古籍文献图像预处理方法研究

扫码查看
敦煌藏文文献是研究唐代吐蕃社会历史的珍贵文献.目前在敦煌藏文文献数字化研究方面,由于文献年代久远、书写载体低劣、保存条件差等各方面的原因使得文档图像背景杂乱、文字模糊并残缺不全,严重影响了文本识别系统的准确性和鲁棒性.为了研究低质古籍文献图像的预处理对文字识别的影响,文章以古籍文献图像质量极差的敦煌藏文文献作为研究对象,分别采用对数变换、伽马变换、中值滤波变换、高斯滤波处理和PS人工批处理等传统方法,及全局阈值、自适应阈值和自定义阈值的二值化、基于神经网络ViT的图像增强方法对图像进行增强.对比实验表明,低质古籍图像预处理对文字识别率提升影响不大,但高斯滤波处理、自定义阈值的图像二值化和基于神经网络的图像数据增强对识别率提升有一定的促进作用.
Study on Image Preprocessing Methods for Low-Quality Ancient Books
Dunhuang Tibetan literature is a precious document for the study of the social history of Tubo in the Tang Dynasty.At present,in the digital research of Dunhuang Tibetan literature,due to the age of the document,the document writing carrier,preservation conditions and other aspects of the reasons make the document image background messy,text fuzzy and incomplete,which seriously affects the accuracy and robustness of the text rec-ognition system.In order to study the influence of image preprocessing of low-quality ancient books on character recognition,this paper takes the Dunhuang Tibetan documents with extremely poor image quality as the research object and uses traditional methods such as logarithmic transformation,gamma transform,median filter trans-form,Gaussian filter processing,and PS manual batch processing to enhance the images,and adopts the binariza-tion of global threshold,adaptive threshold and custom threshold,and image enhancement based on neural net-work ViT.Comparative experiments show that the preprocessing of low-quality ancient book images has little im-pact on the improvement of the recognition rate,however,Gaussian filtering processing,custom threshold image binarization,and neural network-based image data enhancement have a certain effect on the improvement of the recognition rate.

ancient booksDunhuang literaturelow-quality documentspreprocessing

高定国、李婧怡、索朗曲珍

展开 >

西藏大学信息科学技术学院 西藏拉萨 850000

西藏大学藏文信息技术创新人才培养示范基地 西藏拉萨 850000

古籍 敦煌文献 低质文档 预处理

国家自然科学基金项目四川省科技计划项目

621660382023YFQ0044

2024

高原科学研究

高原科学研究

ISSN:
年,卷(期):2024.8(1)
  • 19