首页|A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy
A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy
扫码查看
点击上方二维码区域,可以放大扫码查看
原文链接
NETL
NSTL
IEEE
In deep-learning-based speech enhancement (SE) systems, trained models are often used to handle unseen noise types and language environments in real-life scenarios. However, since production environments differ from training conditions, mismatch problems arise that may cause a serious decrease in the performance of an SE system. In this study, a domain adaptive method combining two adaptation strategies is proposed to improve the generalization of unlabeled noisy speech. In the proposed encoder-decoder-based SE framework, a domain discriminator and a domain confusion adaptation layer are introduced to conduct adversarial training. The model has two main innovations. First, the algorithm optimizes adversarial training by introducing a relativistic discriminator that relies on relative values by applying the difference, thus avoiding possible bias and better reflecting domain differences. Second, the multi-kernel maximum mean discrepancy (MK-MMD) between domains is taken as the regularization term of the domain adversarial loss, thereby further decreasing the edge distribution distance between domains. The proposed model improves the adaptability to unseen noises by encouraging the feature encoder to generate domain-invariant features. The model was evaluated using cross-noise and cross-language-and-noise experiments, and the results show that the proposed method provides considerable improvements over the baseline without an adaptation in the perceptual evaluation of speech quality (PESQ), the short time objective intelligibility (STOI) and the frequency-weighted signal-to-noise ratio (FWSNR).