For real-time communication situation,this paper proposed a real-time single channel speech enhancement algorithm based on Recurrent Neural Network(RNN).Our model can simultaneously capture the global spectral context and local spectral pattern by fusing the full-band features with sub-band features,and satisfy the requirement of one-frame-in and one-frame-out mode during inference based on RNN inside sequential attribution.The results show that,on DNS-Challenge InterSpeech 2020 test dataset,our model achieves a pesq score of 2.85 for the frame streaming mode.Meanwhile,on a fixed system delay of 32 ms,our model costs 1.5 ms for one frame of 16 ms length data under the real-time mode using a NVidia GeForce RTX 3060 GPU.If we convert the PyTorch model to onnx format,it only costs 3.8 ms with an Intel i7@2.7GHz CPU.
关键词
全频带和子频带融合/实时语音增强/噪声去除/循环神经网络/深度学习
Key words
full-band and sub-band fusion/real-time speech enhancement/noise suppression/recurrent neural network/deep learning