L-Vector: Neural Label Embedding for Domain Adaptation

扫码查看

原文链接

NETL
IEEE

外文摘要：We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains。 With NLE method， we distill the knowledge from a powerful source-domain DNN into a dictionary of label embeddings， or l-vectors， one for each senone class。 Each l-vector is a representation of the senone-specific output distributions of the source-domain DNN and is learned to minimize the average L<inf xmlns:mml="http://www。w3。org/1998/Math/MathML" xmlns:xlink="http://www。w3。org/1999/xlink">2</inf>， Kullback-Leibler (KL) or symmetric KL distance to the output vectors with the same label through simple averaging or standard back-propagation。 During adaptation， the l-vectors serve as the soft targets to train the target-domain model with cross-entropy loss。 Without parallel data constraint as in the teacher-student learning， NLE is specially suited for the situation where the paired target-domain data cannot be simulated from the source-domain data。 We adapt a 6400 hours multi-conditional US English acoustic model to each of the 9 accented English (80 to 830 hours) and kids’ speech (80 hours)。 NLE achieves up to 14。1% relative word error rate reduction over direct re-training with one-hot labels。

外文关键词：

deep neural networklabel embeddingdomain adaptationteacher-student learningspeech recognition

作者：

Zhong Meng、Hu Hu、Jinyu Li、Changliang Liu、Yan Huang、Yifan Gong、Chin-Hui Lee

展开 >

作者单位：

Microsoft Corporation,Redmond,WA,USA

Georgia Institute of Technology,Atlanta,GA,USA

会议名称：

IEEE International Conference on Acoustics, Speech and Signal Processing

会议地点：

Barcelona(ES)

会议母体文献：

2020 IEEE International Conference on Acoustics, Speech and Signal Processing

页码：

7389-7393

出版时间：

2020

DOI：

10.1109/ICASSP40776.2020.9053300