Dialect identification model based on feature selection
In order to select the least number of relevant feature variables from the speech samples and make the Guizhou dialect identification model based on Random Forest(RF)achieve the required accuracy,the Python 3.6 is used and the Sort Difference Backward Elimination(SDBE)algorithm based on Random Forests is applied to select important relevant feature variables from the Chinese dialect speech samples of three city groups in Guizhou Province.Nextly,SDBE algorithm is compared with other advanced feature se-lection algorithms.Finally,the Random Forest is improved.The results show that SDBE algorithm selected eight of the most relevant MFCC from 39 feature variables which are significantly outperform the compared feature selection algorithms.The classification accuracy of the improved Random Forest model reaches 96.64%.SDBE algorithm and the improved Random Forest model have significantly improved the perform-ance of the dialect recognition model.
Chinese dialect identificationMel Frequency Cepstrum Coefficientfeature selectionRan-dom Forestbackward elimination