Korean ancient books character recognition method based on unified Chinese and Korean characters ideographic description sequences coding
In order to solve the problem of recognition of mixed Chinese and Korean characters in ancient Korean books,this paper proposes a unified ideographic description sequence(IDS)encoding scheme for Chinese and Korean characters,which aims to recognize ancient Korean books by using a side decomposition chinese character recognition-contrastive language-image pre-training)(CCR-CLIP).Firstly,according to the similarity of Chinese and Korean characters,the Chinese characters'side edges,Korean characters'letters and 12 kinds of basic structures are uniformly coded.Secondly,the IDS sequence file of Chinese characters provided in the original model of CCR-CLIP is extended by adding IDS sequence of Korean characters.Finally,the problem of few samples of Korean ancient books was solved by using printed characters in the training stage.The results show that compared with the CCR-SLD method,the character recognition accuracy of this method is improved by 13.8%in the experiment of Korean ancient books.In the printed text experiment,the accuracy of character recognition improved by 5.38%.The established method is better than other methods in solving the problem of Korean ancient text recognition,and can provide reference for solving the problem of Korean ancient text recognition.
Korean ancient bookszero-shotcharacter recognitioncharacter codingideographic description sequences