Handling data scarcity through data augmentation for detecting offensive speech

扫码查看

原文链接

NETL
NSTL

外文摘要：Detecting offensive speech poses a challenge due to the absence of a universally accepted definition delineating its boundaries. However, the scarcity of labeled data often poses a significant challenge for training robust offensive speech detection models. In this paper, we propose an approach to handle data scarcity through data augmentation techniques tailored for offensive speech detection tasks. By augmenting the existing labeled data with speech samples generated through noise injection, our method effectively expands the training dataset, enabling more comprehensive model training. We evaluate our approach on Vera Am Mittag (VAM) corpus and demonstrate significant improvements in offensive speech detection performance compared to that without data augmentation. Our findings highlight the efficacy of data augmentation in mitigating data scarcity challenges and enhancing the reliability of offensive speech detection systems in a real-world scenario.

外文关键词：

Offensive speechMFCCSWTFeature selectionDeep learningData augmentation

作者：

Sara Sekkate、Safa Chebbi、Abdellah Adib、Sofia Ben Jebara

展开 >

作者单位：

Faculty of Science and Technology, LIM Lab., Hassan Ⅱ University of Casablanca, 146, Mohammedia, Morocco

Higher School of Communications, COSIM Lab., University of Carthage, 2088 Tunis, Tunisia

出版年：

2025

DOI：

10.1007/s12243-025-01072-6

Annals of telecommunications

ISSN：0003-4347

年,卷(期)：2025.80(5/6)

参考文献量38