Exploring data augmentation for Amazigh speech recognition with convolutional neural networks

扫码查看

原文链接

NETL
NSTL

外文摘要：Abstract In the field of speech recognition, enhancing accuracy is paramount for diverse linguistic communities. Our study addresses this necessity, focusing on improving Amazigh speech recognition through the implementation of three distinct data augmentation methods: Audio Augmentation, FilterBank Augmentation, and SpecAugment. Leveraging Convolutional Neural Networks (CNNs) for speech recognition, we utilize Mel Spectrograms extracted from audio files. The study specifically targets the recognition of the initial ten Amazigh digits. We conducted experiments with a speaker-independent approach involving 42 participants. A total of 27 experiments were conducted, utilizing both original and augmented data. Among the different CNN models employed, the VGG19 model showcased significant promise. Our results demonstrate a maximum accuracy of 95.66%. Furthermore, the most notable improvement achieved through data augmentation was 4.67%. These findings signify a substantial enhancement in speech recognition accuracy, indicating the efficacy of the proposed methods.

作者：

Hossam Boulal、Farida Bouroumane、Mohamed Hamidi、Jamal Barkani、Mustapha Abarkan

展开 >

作者单位：

FP Taza, USMBA University

FPN, UMP

出版年：

2025

DOI：

10.1007/s10772-024-10164-y

International journal of speech technology

ISSN：1381-2416

年,卷(期)：2025.28(1)