基于注意力机制的时频域语音增强模型

扫码查看

原文链接

国家科技期刊平台
NETL
NSTL
万方数据
维普

中文摘要：针对频域单通道语音增强技术存在相位失配问题,提出一种联合时域和频域的语音增强算法,在训练阶段对不同领域的学习目标进行联合优化.加入注意力机制模拟人类听觉感知特点,提升模型对噪声信号的抑制能力.同时使用膨胀卷积扩大感受野,能融合更多输入层信息,有效提取时域和频域中的局部特征.同时联合时域、频域损失函数对不同领域进行优化,以提升语音增强性能.为验证该方法的有效性,使用残差时间卷积作为基线模型在数据集VoiceBank上进行广泛实验,该模型相比使用单一时域或频域的基线模型有更好的语音增强效果.去噪后的语音感知质量(PESQ)为3.06,信号失真比(SI-SDR)为20.00.

外文标题：An Attention-Based Model for Time-Domain Frequency Speech Enhancement

外文摘要：To address the phase mismatch problem of single-channel speech enhancement techniques in the frequency domain,a joint time-domain and frequency-domain speech enhancement algorithm is proposed to jointly optimise the learning targets in different domains during the training phase.An attention mechanism is added to simulate human auditory perceptual characteristics to enhance the model's ability to suppress noisy signals.It also uses expanded convolution to widen the perceptual field,enabling the fusion of more input layer information and the effective extraction of local features in the time and frequency domains.To enhance the speech enhancement performance,the joint time-domain and frequency-domain loss functions are optimised for different domain learning.To validate the effectiveness of the proposed method,extensive experiments are conducted on the dataset VoiceBank using residual time convolution as the baseline model,and the experimental structure shows better enhancement than using a single baseline model in the time or frequency domain.The perceptual speech quality(PESQ)after denoising was 3.06 and the signal distortion ratio(SI-SDR)was 20.00.

外文关键词：

speech enhancementtime domainfrequency domainTCNattention module

作者：

林攀、何儒汉

展开 >

作者单位：

武汉纺织大学计算机与人工智能学院

湖北省服装信息化工程技术研究中心,湖北武汉 430200

关键词：

语音增强时域频域时间卷积注意力模块

出版年：

2024

DOI：

10.11907/rjdk.222325

软件导刊

湖北省信息学会

软件导刊

影响因子：0.524

ISSN：1672-7800

年,卷(期)：2024.23(1)

参考文献量1