An In-Vehicle Interaction Speech Enhancement and Recognition Method Based on Lightweight Models in Complex Environment
In order to solve the problem of low recognition rate of in-vehicle voice interaction in complex noise envi-ronment and difficult deployment on devices with limited computing resources,this article proposes a lightweight and ro-bust voice recognition method based on joint training framework in the noisy environment.The speech enhancement model introduces a multi-scale channel time-frequency attention module to extract multi-scale time-frequency features and key in-formation in various dimensions.In the speech recognition model,multi-head element-wise linear attention is proposed,which significantly reduces the computational complexity required for the attention module.Experiments show that the joint training model shows good noise robustness on the self-made dataset.
deep learningspeech enhancementspeech recognitionattention mechanismjoint training