首页|噪声环境下基于注意力的时域语音分离方法

噪声环境下基于注意力的时域语音分离方法

扫码查看
目前,基于深度学习的时域单通道语音分离模型在无噪声场景下取得了显著的成效.然而,在含噪场景下,这些模型的编码器会将噪声特征误认为是源语音特征,影响掩码估计的准确性,导致分离性能不理想.针对此问题,提出一种基于注意力机制的时域语音分离模型,来降低噪声对语音分离任务的影响.具体地,考虑到时域编码器输出特征的各通道重要性差异,提出在编码器内部嵌入一个高效通道注意力(Efficient Channel Attention,ECA)模块,对编码特征的通道进行加权处理.在此基础上,提出采用图注意力网络(Graph Attention Network,GAT)来计算相邻帧间的注意力系数,以此聚合相邻帧间的编码特征,从而隐式地减小了噪声对掩码估计的影响.系统模型在WHAM!、Libri2Mix-Noisy和Libri3Mix-Noisy数据集上的实验结果表明,所提出的基于GAT和ECA的DPRNN(GACA-DPRNN)方法比基线DPRNN性能更优.
An attention-based time-domain speech separation method in noisy environments
Deep learning-based time-domain single-channel speech separation models have achieved significant success in noise-free scenarios.However,they tend to mistakenly encode noise features as source speech features in noisy environments,which affects the accuracy of mask estimation and results in suboptimal separation performance.To deal with this problem,we propose a time-domain speech separation model based on attention mechanisms to mitigate the negative impact of noise on separation performance.First,given the disparate importance of channels in the output features from the temporal encoder,we introduce an efficient channel attention(EC A)module embedded within the encoder to perform weighted processing on the channel-wise features.Second,we adopt a graph attention network(GAT)to compute attention coefficients between adjacent frames for the aggregation of encoded features from neighboring frames,thus the influence of noise on mask estimation can be reduced.Experimental results on the WHAM!,Libri2Mix-Noisy,and Libri3 Mix-Noisy datasets demonstrate that the proposed GAT-ECA-based DPRNN(GACA-DPRNN)outperforms the DPRNN baseline in terms of scale invariant signal-to-noise ratio improvement(SI-SNRi)and signal distortion ratio improvement(SDRi).

speech separationchannel attentiongraph neural networkgraph attention network(GAT)

余传旗、王婷婷、郭海燕、杨震

展开 >

南京邮电大学通信与信息工程学院,江苏南京 210003

南京邮电大学通信与网络技术国家地方联合工程研究中心,江苏南京 210003

语音分离 通道注意力 图神经网络 图注意力网络

2024

南京邮电大学学报(自然科学版)
南京邮电大学

南京邮电大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.486
ISSN:1673-5439
年,卷(期):2024.44(6)