Real-time Single Channel Speech Enhancement Algorithm Based on FullSubNet Model
For real-time communication situation,this paper proposed a real-time single channel speech enhancement algorithm based on Recurrent Neural Network(RNN).Our model can simultaneously capture the global spectral context and local spectral pattern by fusing the full-band features with sub-band features,and satisfy the requirement of one-frame-in and one-frame-out mode during inference based on RNN inside sequential attribution.The results show that,on DNS-Challenge InterSpeech 2020 test dataset,our model achieves a pesq score of 2.85 for the frame streaming mode.Meanwhile,on a fixed system delay of 32 ms,our model costs 1.5 ms for one frame of 16 ms length data under the real-time mode using a NVidia GeForce RTX 3060 GPU.If we convert the PyTorch model to onnx format,it only costs 3.8 ms with an Intel i7@2.7GHz CPU.
full-band and sub-band fusionreal-time speech enhancementnoise suppressionrecurrent neural networkdeep learning