首页|基于时频信息梯度估计的单通道语音增强方法

基于时频信息梯度估计的单通道语音增强方法

扫码查看
[目的]语音增强可用于提升现实噪声环境下语音翻译系统的性能.针对现有基于概率扩散模型的语音增强方法存在生成语音结构被破坏、难以对全局特征建模的问题进行研究.[方法]本文提出基于时频信息梯度估计的单通道语音增强方法.首先将语音复数谱送入编码器中提取深层表征,并提出将残差快速傅里叶卷积(residual fast fourier convolution,Res-FFC)用于修复生成语音并对语音全局特征进行建模,同时在编解码的过程中融入语音时域信息.[结果]在公开数据集Voice Bank-DEMAND上的实验结果表明,相比基于分数生成模型的复数时频域语音增强网络(SGMSE),本文所提方法在客观评价指标SI-SDR和WB-PESQ分别提高0.5和0.19.[结论]本文提出的语音增强方法通过融入Res-FFC和语音时域信息,提升了模型对语音全局特征的捕捉能力,可有效抑制噪声,提升语音质量.
Single-channel speech enhancement method based on time-frequency information gradient estimation
[Objective]Speech enhancement can be used to improve the performance of speech translation systems in real-world noisy environments.Herein our research is conducted to address issues of existing speech enhancement methods based on probabilistic diffusion models,such as the disruption of generated speech structure and the difficulty in modeling global features.[Methods]In this paper,we propose a single-channel speech enhancement method based on time-frequency information gradient estimation.Initially,the speech complex spectrum is fed into an encoder to extract deep representations.It introduces the usage of residual fast Fourier convolution(Res-FFC)to restore generated speech and model global speech features,while incorporating speech temporal information in the encoding and decoding process.[Results]Experimental results on the public dataset Voice Bank-DEMAND demonstrate that,compared to the complex time-frequency domain speech enhancement network based on fraction generating models(SGMSE),the proposed method improves the objective evaluation metrics SI-SDR and WB-PESQ by 0.5 and 0.19,respectively.[Conclusions]The proposed speech-enhancement method enhances the ability of the model to capture global speech features by incorporating Res-FFC and temporal information of the speech,effectively suppressing noises and improving the speech quality.

speech enhancementprobabilistic diffusion modelsingle-channelFFT

高盛祥、方妍文、余正涛、董凌、莫尚斌

展开 >

昆明理工大学信息工程与自动化学院,云南 昆明 650500

昆明理工大学云南省人工智能重点实验室,云南 昆明 650500

云南省融媒体重点实验室,云南 昆明 650500

语音增强 概率扩散模型 单通道 快速傅里叶卷积

2024

厦门大学学报(自然科学版)
厦门大学

厦门大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.449
ISSN:0438-0479
年,卷(期):2024.63(6)