基于FullSubNet的单通道实时语音增强算法

扫码查看

原文链接

万方数据
维普

中文摘要：针对语音实时通信场景,本文提出一种基于循环神经网络的单通道实时语音增强方案.通过模型层面融合全频带和子频带特征以同时捕获全频带信息和局部语谱特征,并且基于循环神经网络的内在时序性,实现了按帧推理实时输出的要求.实验结果显示,在DNS-Challenge InterSpeech 2020的测试集上,本文按帧输出方式的模型可以取得2.85的pesq值.在系统固定延时32 ms的情况下,使用NVidia GeForce RTX 3060 GPU处理16 ms一帧长度的数据耗时1.5 ms;如果将模型转为onnx格式,在Intel i7@2.7GHz CPU上处理一帧数据耗时3.8 ms.

外文标题：Real-time Single Channel Speech Enhancement Algorithm Based on FullSubNet Model

外文摘要：For real-time communication situation,this paper proposed a real-time single channel speech enhancement algorithm based on Recurrent Neural Network(RNN).Our model can simultaneously capture the global spectral context and local spectral pattern by fusing the full-band features with sub-band features,and satisfy the requirement of one-frame-in and one-frame-out mode during inference based on RNN inside sequential attribution.The results show that,on DNS-Challenge InterSpeech 2020 test dataset,our model achieves a pesq score of 2.85 for the frame streaming mode.Meanwhile,on a fixed system delay of 32 ms,our model costs 1.5 ms for one frame of 16 ms length data under the real-time mode using a NVidia GeForce RTX 3060 GPU.If we convert the PyTorch model to onnx format,it only costs 3.8 ms with an Intel i7@2.7GHz CPU.

外文关键词：

full-band and sub-band fusionreal-time speech enhancementnoise suppressionrecurrent neural networkdeep learning

作者：

许苏魁、万家山、潘敬敏、胡婷婷

展开 >

作者单位：

安徽信息工程学院计算机与软件工程学院,安徽芜湖

关键词：

全频带和子频带融合实时语音增强噪声去除循环神经网络深度学习

基金：

安徽信息工程学院青年科研基金安徽省高等学校优秀青年科研项目(2022)认知智能国家重点实验室智能教育开放课题(2022)

项目编号：

22QNJJKJ0052022AH030159iED2022-005

出版年：

2024

科学技术创新

黑龙江省科普事业中心

科学技术创新

影响因子：0.842

ISSN：1673-1328

年,卷(期)：2024.(9)

参考文献量10