首页|面向语种识别的声学特征提取改进研究

面向语种识别的声学特征提取改进研究

扫码查看
在进行语种识别研究时,使用的声学特征矩阵维度往往很高,为了解决语种识别中声学特征维度过高的问题,本文提出一种声学特征提取过程改进方法。对一些常用的声学特征进行统计特性分析,再结合其提取流程及部分文献论证,通过计算特征各维在帧上的均值,再对其进行向量归一化消除量纲的影响得到改进后的特征,实现了将传统特征矩阵优化为一维特征向量。最后,根据改进后特征的特性,在2个不同的数据集下,选取BP神经网络和支持向量机作为基线系统进行语种识别实验。实验结果表明,对于目前常用的5种声学特征,所提改进方法相比于传统做法,在降低了99。8%的数据量情况下,数据集1在2种模型下仍能取得95。6%的平均识别率,数据集2在2种模型下仍能取得90。2%的平均识别率。此外,由于所提方法降低了大部分的计算量,使得算法能够更适应硬件设施相对较弱的嵌入式环境,扩大了算法的使用场景。
Optimization of acoustic feature extraction for language identification
The dimensionalities of the acoustic feature matrix used in language identification studies are often very high.To address the issue of excessive dimensions in acoustic features for language identification,an im-proved method for acoustic feature extraction is proposed.By analyzing the statistical characteristics of some commonly used acoustic features and then combining with their extraction process as well as partial literature arguments,the improved features are obtained by calculating the mean value of each dimension of the features on the frame and then normalizing the vectors to eliminate the influence of the dimensions.This results in the optimization of the traditional feature matrix into a one-dimensional feature vector.Finally,based on the char-acteristics of the improved features,experiments for language identification are conducted using BP neural net-work and Support Vector Machine as the baseline systems on two distinct datasets.The experimental results show that,for the five commonly used acoustic features,the proposed improved method consistently achieves an average identification rate of 95.6%for Dataset1 and 90.2%for Dataset2 under the two models,even with a reduction of 99.8%in data volume,compared to the traditional approaches.In addition,the significant reduction in computational workload achieved by proposed method enhances the adaptability of the algorithm to embedded environments with relatively weak hardware facilities,thereby expanding for the applicability of the algorithm.

Language identificationAcoustic featuresStatistical featuresFeature extraction

周大春、邵玉斌、张昊阁、杜庆治

展开 >

昆明理工大学信息工程与自动化学院,昆明 650504

语种识别 声学特征 统计特性 特征提取

云南省媒体融合重点实验室项目

320225403

2024

四川大学学报(自然科学版)
四川大学

四川大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.358
ISSN:0490-6756
年,卷(期):2024.61(3)