首页|Handling data scarcity through data augmentation for detecting offensive speech

Handling data scarcity through data augmentation for detecting offensive speech

扫码查看
Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.

Offensive speechMFCCSWTFeature selectionDeep learningData augmentation

Sara Sekkate、Safa Chebbi、Abdellah Adib、Sofia Ben Jebara

展开 >

Faculty of Science and Technology, LIM Lab., Hassan Ⅱ University of Casablanca, 146, Mohammedia, Morocco

Higher School of Communications, COSIM Lab., University of Carthage, 2088 Tunis, Tunisia

2025

Annals of telecommunications

Annals of telecommunications

ISSN:0003-4347
年,卷(期):2025.80(5/6)
  • 38