首页|基于多尺度特征融合的Swin Transformer满文识别研究

基于多尺度特征融合的Swin Transformer满文识别研究

扫码查看
针对满文字符识别领域中非标准形态变体和一音多形等固有挑战,提出了一种基于Swin Transformer架构的多尺度特征融合模型(Multi-scale feature fusion based Swin Transformer,MR-SwinT).该模型通过引入多分辨率并行输入机制,实现了字符的细粒度局部特征与宏观语境信息的协同捕获.模型的核心优势在于充分利用了 Swin Transformer的层级式窗口自注意力机制,该机制为大尺度特征建模提供了卓越的表达能力.此外,本文设计的SMTBlocks模块通过自适应加权调整策略,能有效实现多分辨率特征的动态融合,显著增强了模型对复杂字符的区分能力与泛化性能.实验结果表明MR-SwinT模型整词识别准确率为96.59%,单字符识别准确率为99.46%.
The Swin Transformer-based Manchu character recognition model with multi-scale feature fusion
To address the inherent challenges of non-standard morphological variants and multiple graphemic representations of the same phoneme in Manchu character recognition,this paper proposes MR-SwinT,a multi-scale feature fusion model based on the Swin Transformer architecture.The model enables synchronized capture of fine-grained local character features and macro-contextual information via a multi-resolution parallel input mechanism.A core advantage of the model is its full leverage of the Swin Transformer hierarchical,window-based self-attention mechanism,which offers exceptional representational capacity for large-scale feature modeling.Additionally,the SMT Blocks module,specifically designed in this study,achieves effective dynamic fusion of multi-resolution features through an adaptive weighting adjustment strategy,significantly enhancing the model discriminative power and generalization ability for complex characters.Experimental results indicate that the MR-SwinT model attains 96.59%accuracy for whole-word recognition and 99.46%accuracy for single-character recognition.

Manchu recognitionSwin Transformerdeep learningmulti-scale feature fusion

谭振江、李明焱、王大东

展开 >

吉林师范大学数学与计算机学院,吉林四平 136000

满文识别 Swin Transformer 深度学习 多尺度特征融合

2025

吉林师范大学学报(自然科学版)
吉林师范大学

吉林师范大学学报(自然科学版)

影响因子:0.397
ISSN:1674-3873
年,卷(期):2025.46(1)