End-to-End Speech Recognition in News Field Based on Conformer
The open source Chinese speech recognition data sets are usually developed for the general domain.This paper constructs a news-oriented Chinese speech recognition data set named CH_NEWS_ASR,and verifies the va-lidity of the data set by the RNN,Transformer and Conformer models under ESPNET-0.9.6 framework.As news broadcasters speak relatively fast,the average text length in this dataset is 28 characters,which is 2 times of the av-erage text length of Aishell_1 dataset.In this paper,we propose a sentence-level consistency module combined with the Conformer model to directly reduce the representation differences between source speech and target text.Experi-ments demonstrate that,on the Aishell_1 dataset,the CER is reduced by 0.4%and the SER by 2%;on the CH_NEWS_ASR dataset,the CER is reduced by 0.9%and the SER by 3%.