首页|基于FullSubNet的单通道实时语音增强算法

基于FullSubNet的单通道实时语音增强算法

扫码查看
针对语音实时通信场景,本文提出一种基于循环神经网络的单通道实时语音增强方案.通过模型层面融合全频带和子频带特征以同时捕获全频带信息和局部语谱特征,并且基于循环神经网络的内在时序性,实现了按帧推理实时输出的要求.实验结果显示,在DNS-Challenge InterSpeech 2020的测试集上,本文按帧输出方式的模型可以取得2.85的pesq值.在系统固定延时32 ms的情况下,使用NVidia GeForce RTX 3060 GPU处理16 ms一帧长度的数据耗时1.5 ms;如果将模型转为onnx格式,在Intel i7@2.7GHz CPU上处理一帧数据耗时3.8 ms.
Real-time Single Channel Speech Enhancement Algorithm Based on FullSubNet Model
For real-time communication situation,this paper proposed a real-time single channel speech enhancement algorithm based on Recurrent Neural Network(RNN).Our model can simultaneously capture the global spectral context and local spectral pattern by fusing the full-band features with sub-band features,and satisfy the requirement of one-frame-in and one-frame-out mode during inference based on RNN inside sequential attribution.The results show that,on DNS-Challenge InterSpeech 2020 test dataset,our model achieves a pesq score of 2.85 for the frame streaming mode.Meanwhile,on a fixed system delay of 32 ms,our model costs 1.5 ms for one frame of 16 ms length data under the real-time mode using a NVidia GeForce RTX 3060 GPU.If we convert the PyTorch model to onnx format,it only costs 3.8 ms with an Intel i7@2.7GHz CPU.

full-band and sub-band fusionreal-time speech enhancementnoise suppressionrecurrent neural networkdeep learning

许苏魁、万家山、潘敬敏、胡婷婷

展开 >

安徽信息工程学院计算机与软件工程学院,安徽芜湖

全频带和子频带融合 实时语音增强 噪声去除 循环神经网络 深度学习

安徽信息工程学院青年科研基金安徽省高等学校优秀青年科研项目(2022)认知智能国家重点实验室智能教育开放课题(2022)

22QNJJKJ0052022AH030159iED2022-005

2024

科学技术创新
黑龙江省科普事业中心

科学技术创新

影响因子:0.842
ISSN:1673-1328
年,卷(期):2024.(9)
  • 10