基于傅里叶卷积的多通道语音增强

扫码查看

原文链接

万方数据
维普

中文摘要：神经波束形成器(Neural Beamformer)的构建是处理多通道语音增强任务的主要方法之一,其通过求解波束权值对多通道信号进行滤波从而获得纯净语音.与传统波束求解空间协方差矩阵的原理类似,频谱信息和空间线索在神经波束形成器的波束权值估计中也起着至关重要的作用.由于缺乏对频谱和空间信息的充分学习,现有许多工作无法对波束权值进行最优估计.为应对这一挑战,构建了一种基于傅里叶卷积的上下文特征提取器,在频率轴上具有全局感受野,并加入时频卷积模块对时间上下文信息建模,增强对输入频谱图上下文信息的学习;采用了一种新的卷积循环网络(Convolutional Recurrent Network,CRN)结构,其编解码模块中嵌入了所提的上下文特征提取器,并在跳连接中嵌入卷积注意力模块(Convolutional Block Attention Module,CBAM).所提出的CRN结构能充分从输入特征频谱图中捕获时频上下文信息以及跨通道的空间信息.实验结果表明,该方法参数量仅1.14 M,与目前先进的基线系统对比达到最优性能.

外文标题：Multi-channel Speech Enhancement Based on Fourier Convolution

外文摘要：The construction of neural beamformer is one of the main methods to deal with multi-channel speech enhancement tasks,which filters the multi-channel signals to obtain target speech by solving the beam weights.Similar to the principle of the solution of spatial covariance matrix in traditional beamforming,spectral-spatial information also plays a crucial role in the beam weights prediction of neural beamformer.However,due to the lack of adequate learning of spectral-spatial information,many existing efforts fail to optimally predict the beam weights.In order to deal with this challenge,a context feature extractor based on Fourier convolution is proposed,with which a global receptive field on the frequency is involved.Besides,the modeling of temporal context information is also realized by adding a time-frequency convolutional module to boost the learning of context from input spectrograms.In addition,a Convolutional Recurrent Network(CRN)structure is applied,in which the proposed context feature extractor is embedded in the encoders and decoders,and a Convolutional Block Attention Module(CBAM)is involved in the skip connection.The proposed CRN structure can capture the time-frequency context information and cross-channel spatial features sufficiently from the input spectrograms.Experimental results show that the parameter quantity of the proposed approach is only 1.14 M,which indicates great superiority over the existing advanced baseline systems.

外文关键词：

multi-channelspeech enhancementneural beamformerFourier convolutiondeep learning

作者：

孙思雨、张海剑、陈佳佳

展开 >

作者单位：

武汉大学电子信息学院,湖北武汉 430072

关键词：

多通道语音增强神经波束形成器傅里叶卷积深度学习

基金：

湖北省自然科学基金

项目编号：

2022CFB084

出版年：

2024

DOI：

10.3969/j.issn.1003-3106.2024.03.009

无线电工程

中国电子科技集团公司第五十四研究所

无线电工程

影响因子：0.667

ISSN：1003-3106

年,卷(期)：2024.54(3)

参考文献量41