Study on Image Preprocessing Methods for Low-Quality Ancient Books
Dunhuang Tibetan literature is a precious document for the study of the social history of Tubo in the Tang Dynasty.At present,in the digital research of Dunhuang Tibetan literature,due to the age of the document,the document writing carrier,preservation conditions and other aspects of the reasons make the document image background messy,text fuzzy and incomplete,which seriously affects the accuracy and robustness of the text rec-ognition system.In order to study the influence of image preprocessing of low-quality ancient books on character recognition,this paper takes the Dunhuang Tibetan documents with extremely poor image quality as the research object and uses traditional methods such as logarithmic transformation,gamma transform,median filter trans-form,Gaussian filter processing,and PS manual batch processing to enhance the images,and adopts the binariza-tion of global threshold,adaptive threshold and custom threshold,and image enhancement based on neural net-work ViT.Comparative experiments show that the preprocessing of low-quality ancient book images has little im-pact on the improvement of the recognition rate,however,Gaussian filtering processing,custom threshold image binarization,and neural network-based image data enhancement have a certain effect on the improvement of the recognition rate.
ancient booksDunhuang literaturelow-quality documentspreprocessing