基于非线性语谱图联合判决的语种识别

Language identification based on joint decision of nonlinear spectrograms

段云 ¹邵玉斌 ¹龙华 ¹杜庆治¹

扫码查看

作者信息

1. 昆明理工大学信息工程与自动化学院,云南昆明 650504
折叠

摘要

针对灰度对数语谱图对基频拉伸幅度过大,短时长语音识别率提升受限的问题,提出一种非线性语谱图联合判决的语种识别方法.首先,对语音进行能量归一化,提取对数功率谱,将频率刻度按照人耳听觉感知进行非线性映射得到非线性语谱图.然后,将非线性语谱图按词关联特性进行等间隔拆分,在ResNet网络后端加入联合判决层;输出语音所属语种类型.实验结果表明,所提方法有效改善灰度对数语谱图的缺点,识别性能均高于语谱图及改进特征.联合判决对切分时长为1.0s的样本语音取得的识别效果最佳,在广播音频数据集中,识别率达到94.25％;在VoxForge公共语料集中,识别率达到98.94％.

Abstract

To address the problem that the gray-scale logarithmic speech spectrogram is too stretched to the fundamental frequency,which limits the improvement of short-length speech identification rate,a language identification method with joint judgment of nonlinear speech spectrogram is proposed.Firstly,the logarithmic power spectrum is extracted by energy normalization,and the nonlinear speech spectrogram is obtained by nonlinear mapping of frequency scales according to human ear perception.Then,the nonlinear speech spectrogram is split into equal intervals according to word association characteristics,and the joint judgment layer is added at the back end of the ResNet network.Finally,the language type of the speech is output.The experimental results show that the proposed method can effectively improve the shortcomings of the gray-scale logarithmic speech spectrogram,and the recognition performance is higher than that of the speech spectrogram and the improved features.The best recognition results are obtained for the sample speech with a cut time of 1.0 s,and the recognition rate reaches 94.25％in the broadcast audio data set and 98.94％in the VoxForge public corpus.

关键词

语种识别/语谱图/非线性/联合判决/神经网络

Key words

language identification/spectrogram/nonlinearity/joint judgment/neural networks

引用本文复制引用

基金项目

国家自然科学基金(61761025)

出版年

2024

微电子学与计算机

中国航天科技集团公司第九研究院第七七一研究所

微电子学与计算机

CSTPCD

影响因子：0.431

ISSN：1000-7180

参考文献量20

段落导航