首页|基于特征选择的方言辨别模型

基于特征选择的方言辨别模型

扫码查看
为了从语音样本中选择数量最少的相关特征变量,并让基于随机森林(RF)的贵州汉语方言辨别模型达到所需的精度.该研究采用基于随机森林的差异排序向后消除法(SDBE),利用Python 3.6,对贵州3个市县群的汉语方言语音样本进行特征选择,并与其他先进的特征选择方法进行比较,最后对随机森林分类模型进行改进.结果显示,该方法从39个特征变量中选取了 8个最相关的梅尔频率倒谱系数(MFCC),显著优于与之比较的特征选择方法.经过改进的随机森林模型分类精确度为96.64%.该研究采用的特征选择算法和改进的随机森林模型,让方言辨别模型的性能得到显著提升.
Dialect identification model based on feature selection
In order to select the least number of relevant feature variables from the speech samples and make the Guizhou dialect identification model based on Random Forest(RF)achieve the required accuracy,the Python 3.6 is used and the Sort Difference Backward Elimination(SDBE)algorithm based on Random Forests is applied to select important relevant feature variables from the Chinese dialect speech samples of three city groups in Guizhou Province.Nextly,SDBE algorithm is compared with other advanced feature se-lection algorithms.Finally,the Random Forest is improved.The results show that SDBE algorithm selected eight of the most relevant MFCC from 39 feature variables which are significantly outperform the compared feature selection algorithms.The classification accuracy of the improved Random Forest model reaches 96.64%.SDBE algorithm and the improved Random Forest model have significantly improved the perform-ance of the dialect recognition model.

Chinese dialect identificationMel Frequency Cepstrum Coefficientfeature selectionRan-dom Forestbackward elimination

艾虎、李菲

展开 >

贵州警察学院刑事技术系,贵阳 550005

贵州师范大学外国语学院,贵阳 550025

汉语方言辨识 梅尔频率倒谱系数 特征选择 随机森林 向后消除法

贵州省教育厅创新群体项目

黔教合KY字[2021]023

2024

信息技术
黑龙江省信息技术学会 中国电子信息产业发展研究院 中国信息产业部电子信息中心

信息技术

CSTPCD
影响因子:0.413
ISSN:1009-2552
年,卷(期):2024.(10)