A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy

扫码查看

原文链接

NETL
NSTL
IEEE

外文摘要：In deep-learning-based speech enhancement (SE) systems, trained models are often used to handle unseen noise types and language environments in real-life scenarios. However, since production environments differ from training conditions, mismatch problems arise that may cause a serious decrease in the performance of an SE system. In this study, a domain adaptive method combining two adaptation strategies is proposed to improve the generalization of unlabeled noisy speech. In the proposed encoder-decoder-based SE framework, a domain discriminator and a domain confusion adaptation layer are introduced to conduct adversarial training. The model has two main innovations. First, the algorithm optimizes adversarial training by introducing a relativistic discriminator that relies on relative values by applying the difference, thus avoiding possible bias and better reflecting domain differences. Second, the multi-kernel maximum mean discrepancy (MK-MMD) between domains is taken as the regularization term of the domain adversarial loss, thereby further decreasing the edge distribution distance between domains. The proposed model improves the adaptability to unseen noises by encouraging the feature encoder to generate domain-invariant features. The model was evaluated using cross-noise and cross-language-and-noise experiments, and the results show that the proposed method provides considerable improvements over the baseline without an adaptation in the perceptual evaluation of speech quality (PESQ), the short time objective intelligibility (STOI) and the frequency-weighted signal-to-noise ratio (FWSNR).

外文关键词：

Adaptation modelsTrainingFeature extractionAcousticsNoise measurementTask analysisSpeech enhancement

作者：

Jiaming Cheng、Ruiyu Liang、Zhenlin Liang、Li Zhao、Chengwei Huang、Björn Schuller

展开 >

作者单位：

School of Information Science and Engineering, Southeast University, Nanjing, P.R. China

Sugon (Nanjing) Institute of Chinese Academy of Sciences Co. Ltd., Nanjing, P.R. China

ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Augsburg, Germany

出版年：

2021

DOI：

10.1109/TASLP.2020.3036611

IEEE/ACM transactions on audio, speech, and language processing

ISSN：2329-9290

年,卷(期)：2021.29(1)

被引量4
参考文献量45