基于特征选择的方言辨别模型

扫码查看

原文链接

万方数据
维普

中文摘要：为了从语音样本中选择数量最少的相关特征变量,并让基于随机森林(RF)的贵州汉语方言辨别模型达到所需的精度.该研究采用基于随机森林的差异排序向后消除法(SDBE),利用Python 3.6,对贵州3个市县群的汉语方言语音样本进行特征选择,并与其他先进的特征选择方法进行比较,最后对随机森林分类模型进行改进.结果显示,该方法从39个特征变量中选取了 8个最相关的梅尔频率倒谱系数(MFCC),显著优于与之比较的特征选择方法.经过改进的随机森林模型分类精确度为96.64％.该研究采用的特征选择算法和改进的随机森林模型,让方言辨别模型的性能得到显著提升.

外文标题：Dialect identification model based on feature selection

外文摘要：In order to select the least number of relevant feature variables from the speech samples and make the Guizhou dialect identification model based on Random Forest(RF)achieve the required accuracy,the Python 3.6 is used and the Sort Difference Backward Elimination(SDBE)algorithm based on Random Forests is applied to select important relevant feature variables from the Chinese dialect speech samples of three city groups in Guizhou Province.Nextly,SDBE algorithm is compared with other advanced feature se-lection algorithms.Finally,the Random Forest is improved.The results show that SDBE algorithm selected eight of the most relevant MFCC from 39 feature variables which are significantly outperform the compared feature selection algorithms.The classification accuracy of the improved Random Forest model reaches 96.64％.SDBE algorithm and the improved Random Forest model have significantly improved the perform-ance of the dialect recognition model.

外文关键词：

Chinese dialect identificationMel Frequency Cepstrum Coefficientfeature selectionRan-dom Forestbackward elimination

作者：

艾虎、李菲

展开 >

作者单位：

贵州警察学院刑事技术系,贵阳 550005

贵州师范大学外国语学院,贵阳 550025

关键词：

汉语方言辨识梅尔频率倒谱系数特征选择随机森林向后消除法

基金：

贵州省教育厅创新群体项目

项目编号：

黔教合KY字[2021]023

出版年：

2024

DOI：

10.13274/j.cnki.hdzj.2024.10.016

信息技术

黑龙江省信息技术学会中国电子信息产业发展研究院　中国信息产业部电子信息中心

信息技术

CSTPCD

影响因子：0.413

ISSN：1009-2552

年,卷(期)：2024.(10)