基于多尺度特征融合的Swin Transformer满文识别研究

扫码查看

原文链接

万方数据
维普

中文摘要：针对满文字符识别领域中非标准形态变体和一音多形等固有挑战,提出了一种基于Swin Transformer架构的多尺度特征融合模型(Multi-scale feature fusion based Swin Transformer,MR-SwinT).该模型通过引入多分辨率并行输入机制,实现了字符的细粒度局部特征与宏观语境信息的协同捕获.模型的核心优势在于充分利用了 Swin Transformer的层级式窗口自注意力机制,该机制为大尺度特征建模提供了卓越的表达能力.此外,本文设计的SMTBlocks模块通过自适应加权调整策略,能有效实现多分辨率特征的动态融合,显著增强了模型对复杂字符的区分能力与泛化性能.实验结果表明MR-SwinT模型整词识别准确率为96.59％,单字符识别准确率为99.46％.

外文标题：The Swin Transformer-based Manchu character recognition model with multi-scale feature fusion

外文摘要：To address the inherent challenges of non-standard morphological variants and multiple graphemic representations of the same phoneme in Manchu character recognition,this paper proposes MR-SwinT,a multi-scale feature fusion model based on the Swin Transformer architecture.The model enables synchronized capture of fine-grained local character features and macro-contextual information via a multi-resolution parallel input mechanism.A core advantage of the model is its full leverage of the Swin Transformer hierarchical,window-based self-attention mechanism,which offers exceptional representational capacity for large-scale feature modeling.Additionally,the SMT Blocks module,specifically designed in this study,achieves effective dynamic fusion of multi-resolution features through an adaptive weighting adjustment strategy,significantly enhancing the model discriminative power and generalization ability for complex characters.Experimental results indicate that the MR-SwinT model attains 96.59％accuracy for whole-word recognition and 99.46％accuracy for single-character recognition.

外文关键词：

Manchu recognitionSwin Transformerdeep learningmulti-scale feature fusion

作者：

谭振江、李明焱、王大东

展开 >

作者单位：

吉林师范大学数学与计算机学院,吉林四平 136000

关键词：

满文识别 Swin Transformer 深度学习多尺度特征融合

出版年：

2025

DOI：