首页|Face mask recognition from audio: The MASC database and an overview on the mask challenge
Face mask recognition from audio: The MASC database and an overview on the mask challenge
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NSTL
Elsevier
The sudden outbreak of COVID-19 has resulted in tough challenges for the field of biometrics due to its spread via physical contact, and the regulations of wearing face masks. Given these constraints, voice biometrics can offer a suitable contact-less biometric solution; they can benefit from models that clas-sify whether a speaker is wearing a mask or not. This article reviews the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 COMputational PARalinguistics challengE (ComParE), which focused on the fol-lowing classification task: Given an audio chunk of a speaker, classify whether the speaker is wearing a mask or not. First, we report the collection of the Mask Augsburg Speech Corpus (MASC) and the base-line approaches used to solve the problem, achieving a performance of 71 . 8% Unweighted Average Re-call (UAR). We then summarise the methodologies explored in the submitted and accepted papers that mainly used two common patterns: (i) phonetic-based audio features, or (ii) spectrogram representations of audio combined with Convolutional Neural Networks (CNNs) typically used in image processing. Most approaches enhance their models by adapting ensembles of different models and attempting to increase the size of the training data using various techniques. We review and discuss the results of the partici-pants of this sub-challenge, where the winner scored a UAR of 80 . 1% . Moreover, we present the results of fusing the approaches, leading to a UAR of 82 . 6% . Finally, we present a smartphone app that can be used as a proof of concept demonstration to detect in real-time whether users are wearing a face mask; we also benchmark the run-time of the best models. (c) 2021 Elsevier Ltd. All rights reserved.