Zhongnan Hospital of Wuhan University Reports Findings in Voice Disorders (Voice Disorder Classification Using Wav2vec 2.0 Feature Extraction)

扫码查看

原文链接

NETL
NSTL

外文摘要：New research on Laryngeal Diseases and Conditions -Voice Disorders is the subject of a report. According to news repo rting originating from Wuhan, People's Republic of China, by NewsRx corresponden ts, research stated, "The study aims to classify normal and pathological voices by leveraging the wav2vec 2.0 model as a feature extraction method in conjunctio n with machine learning classifiers. Voice recordings were sourced from the publ icly accessible VOICED database." Our news editors obtained a quote from the research from the Zhongnan Hospital o f Wuhan University, "The data underwent preprocessing, including normalization a nd data augmentation, before being input into the wav2vec 2.0 model for feature extraction. The extracted features were then used to train four machine learning models-Support Vector Machine (SVM), K-Nearest Neighbors, Decision Tree (DT), a nd Random Forest (RF)-which were evaluated using Stratified K-Fold cross-validat ion. Performance metrics such as accuracy, precision, recall, F1-score, macro av erage, micro average, receiver-operating characteristic (ROC) curve, and confusi on matrix were utilized to assess model performance. The RF model achieved the h ighest accuracy (0.98 ± 0.02), alongside strong recall (0.97 ± 0.04), F1-score ( 0.95 ± 0.05), and consistently high area under the curve (AUC) values approachin g 1.00, indicating superior classification performance. The DT model also demons trated excellent performance, particularly in precision (0.97 ± 0.02) and F1-sco re (0.96 ± 0.02), with AUC values ranging from 0.86 to 1.00. Macro-averaged and micro-averaged analyses showed that the DT model provided the most balanced and consistent performance across all classes, while RF model exhibited robust perfo rmance across multiple metrics. Additionally, data augmentation significantly en hanced the performance of all models, with marked improvements in accuracy, reca ll, F1-score, and AUC values, especially notable in the RF and DT models. ROC cu rve analysis further confirms the consistency and reliability of the RF and SVM models across different folds, while confusion matrix analysis revealed that RF and SVM models had the fewest misclassifications in distinguishing ‘Normal' and ‘Pathological' samples. Consequently, RF and DT models emerged as the most robus t performers, making them particularly well-suited for the voice classification task in this study."

外文关键词：

WuhanPeople's Republic of ChinaAsiaCyborgsEmerging TechnologiesHealth and MedicineLaryngeal Diseases and Con ditionsMachine LearningNeurologic ManifestationsOtorhinolaryngologic Disea ses and ConditionsRespiratory Tract Diseases and ConditionsSupport Vector Ma chinesVoice Disorders

出版年：

2024

Robotics & Machine Learning Daily News

ISSN：

年,卷(期)：2024.(Oct.7)