首页|语义增强的零样本甲骨文字符识别

语义增强的零样本甲骨文字符识别

扫码查看
甲骨文识别对于了解中国历史和传承中华文化都有重要的价值.目前,人工识别甲骨文需要具备丰富的专家经验并耗费大量的时间,而自动识别甲骨文的方法绝大部分受制于闭集假设,在甲骨文这种陆续发现新字符的现实场景下适用范围受限.为此,有研究者提出零样本甲骨文字符识别,其从视觉匹配的角度出发,将字模图像作为字符类别参考,通过拓片图像与字模图像的相似度匹配实现拓片图像的字符识别,然而其忽略了甲骨文拓片图像样本类内方差大的难点,仍存在因字形多变而容易匹配错误的不足.本文提出了一种两阶段的语义增强零样本甲骨文字符识别方法.第一阶段为域无关的字符语义学习阶段,通过提示学习从甲骨文拓片和字模图像中提取字符语义,解决甲骨文字符缺乏语义的问题.为应对拓片与字模之间的域差异,我们分别设置可学习的域提示信息和字符类别提示信息,通过解耦两者的语义实现更准确的特征提取.第二阶段为语义增强的字符图像视觉匹配阶段,模型通过两个分支分别提取类内共享特征和类间差异特征.第一个分支使用对比学习,将同一字符类别的不同字形视觉特征对齐到字符语义,引导模型关注类内共享特征;第二个分支使用损失函数N-Pair,增强模型对不同字符类别间差异特征的学习.在测试阶段,模型无须语义特征,通过训练中学到的类内相似性和类间差异性特征,实现更准确的拓片与字模匹配,提升零样本识别性能.我们在拓片数据集OBC306和字模数据集SOC5519上进行实验验证,实验结果表明,本文提出的方法在零样本甲骨文识别准确率比基准方法性能提升超过25%.
Semantic-Enhanced Zero-Shot Oracle Character Recognition
Oracle bone character recognition holds significant value for understanding Chinese history and the inheri-tance of Chinese culture.Currently,manual recognition of oracle bone character requires extensive expert experience and consumes a great deal of time,while the majority of methods for automatic recognition are constrained by the closed-set as-sumption.This limitation becomes pronounced in the context of oracle bones,where new characters are continuously dis-covered.To address this,some researchers achieved zero-shot oracle character recognition by visual matching.This method employs handprinted images as category references,achieving character recognition in scanned images through similarity matching with handprinted references.However,this approach overlooks the challenge of large intra-class variance in ora-cle bone scanned images,leading to potential mismatches due to the variability in glyphs.This paper proposes a two-stage semantic-enhanced zero-shot oracle character recognition method.The first stage is domain-independent character semantic learning,where the contrastive vision-language pre-training model CLIP is used to extract character semantics from oracle rubbings and template images through prompt learning,addressing the lack of semantic information in oracle characters.To cope with the domain differences between rubbings and templates,we set learnable domain-specific prompts and character category prompts,decoupling their semantics to achieve more accurate feature extraction.The second stage is semantic-en-hanced character image visual matching.The model extracts intra-class shared features and inter-class distinctive features through two branches.The first branch uses contrastive learning to align the visual features of different glyphs within the same character category to the character semantics,guiding the model to focus on intra-class shared features.The second branch employs the loss function N-Pair to enhance the model's ability to learn distinctive features between different charac-ter categories.During the testing phase,the model does not require semantic features;instead,it utilizes the intra-class simi-larity and inter-class distinctiveness learned during training to achieve more accurate matching between rubbings and tem-plates,improving zero-shot recognition performance.Experimental validation on the scanned images dataset OBC306 and the handprinted images dataset SOC5519 demonstrates that our proposed method surpasses the baseline method in zero-shot oracle character recognition accuracy by over 25%.

oracle character recognitionzero-shot recognitionvisual matchingsemantic-enhancedvision lan-guage modelcontrastive learning

刘宗昊、彭文杰、代港、黄双萍、刘永革

展开 >

华南理工大学电子与信息学院,广东 广州 510641

安阳师范学院,河南 安阳 455099

甲骨文字识别 零样本识别 视觉匹配 语义增强 视觉-语言模型 对比学习

国家重点研发计划国家自然科学基金国家自然科学基金广州市重点领域研发计划粤港澳联合创新领域项目

2023YFC350290062176093616731822022060300012023A0505030016

2024

电子学报
中国电子学会

电子学报

CSTPCD北大核心
影响因子:1.237
ISSN:0372-2112
年,卷(期):2024.52(10)