首页|基于预训练模型的多音字消歧方法

基于预训练模型的多音字消歧方法

扫码查看
字音转换是中文语音合成系统(Text-To-Speech,TTS)的重要组成部分,其核心问题是多音字消歧,即在若干候选读音中为多音字选择一个正确的发音.现有的方法通常无法充分理解多音字所在词语的语义,且多音字数据集存在分布不均衡的问题.针对以上问题,提出了一种基于预训练模型RoBERTa的多音字消歧方法CLTRoBERTa(Cross-lingual Translation RoBERTa).首先联合跨语言互译模块获得多音字所在词语的另一种语言翻译,并将其作为额外特征输入模型以提升对词语的语义理解,然后使用判别微调中的层级学习率优化策略来适应神经网络不同层之间的学习特性,最后结合样本权重模块以解决多音字数据集的分布不均衡问题.CTLRoBERTa平衡了数据集的不均衡分布带来的性能差异,并且在CPP(Chinese Poly-phone with Pinyin)基准数据集上取得了 99.08%的正确率,性能优于其他基线模型.
Polyphone Disambiguation Based on Pre-trained Model
Grapheme-to-phoneme conversion(G2P)is an important part of the Chinese text-to-speech system(TTS).The key is-sue of G2P is to select the correct pronunciation for polyphonic characters among several alternatives.Existing methods usually struggle to fully grasp the semantics of words that contain polyphonic characters,and fail to effectively handle the imbalanced dis-tribution in datasets.To solve these problems,this paper proposes a polyphone disambiguation method based on the pre-trained model RoBERTa,called cross-lingual translation RoBERTa(CLTRoBERTa).Firstly,the cross-lingual translation module gene-rates another translation of the word containing the polyphonic character as an additional input feature to improve the model's se-mantic comprehension.Secondly,the hierarchical learning rate optimization strategy is employed to adapt the different layers of the neural network.Finally,the model is enhanced with the sample weight module to address the imbalanced distribution in the dataset.Experimental results show that CLTRoBERTa mitigates performance differences caused by uneven dataset distribution and achieves a 99.08%accuracy on the public Chinese polyphone with pinyin(CPP)dataset,outperforming other baseline models.

Polyphone disambiguationPre-trained modelGrapheme-to-phoneme conversionCross-lingual translationHierarchi-cal learning rateSample weight

高贝贝、张仰森

展开 >

北京信息科技大学智能信息处理研究所 北京 100192

多音字消歧 预训练模型 字音转换 跨语言互译 层级学习率 样本权重

国家自然科学基金

62176023

2024

计算机科学
重庆西南信息有限公司(原科技部西南信息中心)

计算机科学

CSTPCD北大核心
影响因子:0.944
ISSN:1002-137X
年,卷(期):2024.51(11)