语音增强通过抑制背景噪音,提高语音质量和可理解性,进而提升语音相关产品的性能。针对SEGAN(Speech Enhancement Generic Adversary Network)模型在语音信号处理过程中缺乏全局关键信息的问题,提出了一种基于自注意力机制改进的生成对抗网络语音增强算法:SA-SEGAN(Self-Attention Mechanism Improvement Based on Speech Enhancement Generic Adversary Network)。SA-SEGAN运用自注意力机制对编码器的输出进行处理,以提取关注的空间与通道的重要全局信息,从而更完善地对语音信号进行处理,并采用Log-Cosh损失以更好地处理偏差较大的样本,同时引入分位数损失,赋予模型探索样本分布的能力。实验表明,SA-SEGAN相比SEGAN,在客观指标上提升 10。9%。消融实验证实,实验中采用的三种方法均发挥积极作用。
Improved SEGAN Speech Enhancement Based on Self-Attention Mechanism
Speech enhancement improves speech quality and understandability by suppressing background noise,thus improving the performance of speech related products.Aiming at the problem that SEGAN model lacks global key information in the process of speech signal processing,this paper proposes an improved generate adversarial network voice enhancement algorithm based on Self-Attention Mechanism:SA-SEGAN.SA-SEGAN uses the Self-Attention Mechanism to process the output of the encoder to extract the important global information of the space and channel of interest,so as to process the voice signal more perfectly.It also uses Log-Cosh loss to better process samples with larger deviation,and introduces Quantile Loss to endow the model with the ability to explore the distribution of samples.Experiments show that SA-SEGAN is 10.9%higher than SEGAN in terms of perceptual evaluation.And the ablation experiment confirms that the three methods used in the experiment play an active role.