基于中朝统一IDS编码的朝鲜语古籍文字识别方法

扫码查看

原文链接

万方数据
维普

中文摘要：为解决朝鲜语古籍中的中文和朝鲜文字混排的识别难题,提出一种中朝文字的表意文字描述序列(IDS)统一编码方案,旨在通过利用偏旁分解字符识别模型(CCR-CLIP)识别朝鲜语古籍文字.首先,根据中朝文字结构的相似性,对文字中出现的汉字偏旁、朝鲜文字字母和12种基本结构进行了统一编码;其次,通过加入朝鲜文字的IDS序列扩充了 CCR-CLIP原模型中提供的汉字的IDS序列文件;最后,通过在训练阶段使用印刷体文字训练的方式解决了朝鲜语古籍样本少的问题.

外文标题：Korean ancient books character recognition method based on unified Chinese and Korean characters ideographic description sequences coding

外文摘要：In order to solve the problem of recognition of mixed Chinese and Korean characters in ancient Korean books,this paper proposes a unified ideographic description sequence(IDS)encoding scheme for Chinese and Korean characters,which aims to recognize ancient Korean books by using a side decomposition chinese character recognition-contrastive language-image pre-training)(CCR-CLIP).Firstly,according to the similarity of Chinese and Korean characters,the Chinese characters'side edges,Korean characters'letters and 12 kinds of basic structures are uniformly coded.Secondly,the IDS sequence file of Chinese characters provided in the original model of CCR-CLIP is extended by adding IDS sequence of Korean characters.Finally,the problem of few samples of Korean ancient books was solved by using printed characters in the training stage.The results show that compared with the CCR-SLD method,the character recognition accuracy of this method is improved by 13.8％in the experiment of Korean ancient books.In the printed text experiment,the accuracy of character recognition improved by 5.38％.The established method is better than other methods in solving the problem of Korean ancient text recognition,and can provide reference for solving the problem of Korean ancient text recognition.

外文关键词：

Korean ancient bookszero-shotcharacter recognitioncharacter codingideographic description sequences

作者：

赵梦玲、金小峰

展开 >

作者单位：

延边大学融合学院,吉林延吉 133002

关键词：

朝鲜语古籍零样本文字识别文字编码表意文字描述序列

基金：

吉林省教育厅人文社科基础研究项目

项目编号：

JJKH20230608SK

出版年：

2024

延边大学学报(自然科学版)

延边大学

延边大学学报(自然科学版)

影响因子：0.388

ISSN：1004-4353

年,卷(期)：2024.50(2)