基于角度压制比谱减的环境自适应双麦语音增强

Environment adaptive dual-microphone speech enhancement based on direction mitigation ratio spectral subtraction

扫码查看

原文链接

维普
万方数据

中文摘要：[目的]针对智能终端小型化、使用场景多样化的发展趋势,研制一种既能满足严苛的尺度、算力、存储空间限制,又能实现环境自适应的双麦语音增强算法.[方法]考虑到麦克风阵列波束形成算法可以增强期望方向信号,同时抑制非期望方向的噪声,但小尺寸阵列波束主瓣波束宽度较宽、影响增强效果.在小尺寸双麦对目标方向进行波束对准增强的基础上,参考干扰方向噪声,进一步对目标方向语音进行谱减处理,并引入角度压制比实时检测干扰方向噪声的能量估计,实现对不同混响、噪声类型的自适应处理,从而提升语音增强效果.[结果]角度压制比随混响时间增加而增大,与信噪比不相关.相对于原始带噪信号、滤波-累加波束形成(filter-and-sum beamforming,FSB)信号、FSB结合固定对向谱减的语音增强信号,通过FSB结合角度压制比自适应对向谱减得到的语音增强信号,在不同噪声类型、不同信噪比和不同混响时间下,均能得到最高的分段信噪比得分和大多数的最高客观语音质量评估得分.[结论]角度压制比能一定程度地反映不同的混响情况,利用角度压制比得到的谱减阈值具有一定的环境适应性.

外文摘要：[Objective]The voice front-end plays an important role in collecting and ensuring the quality of speech signals so that different types of speech processing can be supported.The increasing application of small size intelligent terminals in highly diverse application scenarios brings significant challenges to the speech enhancement performance of the voice front-ends under complicated reverberant and noisy environments.As the beam directivity of microphone array beamforming algorithm depends highly on microphone array sizes and element numbers,dual-microphones that are popularly adopted in small size intelligent terminals endure substantial performance degradation.In this paper,an environment adaptive dual-microphone speech enhancement algorithm based on direction mitigation ratio spectral subtraction is proposed to improve the speech-enhancement performance of dual-microphone array under different environments.[Methods]First,a least-squares(LS)driven filter-and-sum(FSB)dual-microphone beamformer is designed to yield the preliminary speech enhancement with its signal beam and noise beam aiming at desired directions and undesired directions,respectively.Then,the noise reference collected by the noise beam is used to remove residual noises that are contained in the beamforming enhanced speech by the way of spectral subtraction.Specifically,a direction mitigation ratio(DMR)parameter is defined to carry the environmental information,which is calculated in each frame to determine the spectral subtraction threshold.Thus,by updating the DMR in real time,the spectral subtraction processing between the enhanced speech and noise reference is adaptively controlled to achieve environmental prediction and achieve improved effects of residual noise removing.[Results]For the purpose of performance evaluation and comparison,practical experiments are carried out in anechoic laboratory,in which speakers located in different directions are used as artificial noise resources to generate environmental noises with different signal-to-noise ratios(SNRs).Experimental data collected by the microphone array is used to generate reverberated signals with different reverberation levels using the IMAGE reverberation model to verify the impact of environmental changes on experimental results.In these practical experiments,segment signal-to-noise ratio(segSNR)and perceptual evaluation of speech quality(PESQ)score are adopted as quantitative evaluation metrics.Experimental results under different noisy and reverberant environments reveal that the proposed algorithm can effectively remove residual noises of FSB,and that the waveform is the closest to the pure speech.In terms of segSNR,the algorithm proposed herein outperforms FSB under different signal-to-noise ratios,noise types,noise angles,and reverberation times.Compared to FSB and the fixed spectral subtraction threshold method,the proposed method achieves an average segSNR improvement of 2.97 and 2.75 dB,respectively.In terms of the PESQ score,we also obtain the best results,indicating better subjective listening feeling.Under a reverberation time of 0.2 s,the proposed algorithm yields an average PESQ improvement of 0.76 points at an SNR range of-5 to 10 dB,corresponding to an average improvement of 0.36 points and 0.16 points compared to the FSB and the fixed spectral subtraction method respectively.Meanwhile,the capability of the DMR parameter in characterizing environmental patterns has also been verified,thus offering an adaptive adjustment mechanism for the proposed method under different environment.[Conclusions]Experimental results and analyses show that,by combining the traditional FSB beamforming with the spectral subtraction processing,the proposed algorithm is capable of achieving promising speech enhancement performance under different noisy and reverberant backgrounds.Incidentally,the adverse impact of backgrounds is addressed via the newly defined DMR parameter to enable environmental adaptability.Note that,compared with the pure FSB algorithm,the proposed algorithm improves the residual noise removing effect via beamforming and then performing opposite spectral subtraction with DMR determined environmental adaptive threshold.Compared to the FSB combined with fixed spectral subtraction,the proposed algorithm reduces the speech distortion caused by negative effects of spectral subtraction,and achieves better residual noise removal effects under different environments.Moreover,with low computational complexity and no requirement of parameter tuning,the hardware-implementation convenience of the proposed algorithm in dual-microphone front-end secures the potential of being applied in research and development of practical small size intelligent terminal products.

外文关键词：

dual-microphonemicrophone arraybeamformingspectral subtractiondirection mitigation ratio

作者：

张家扬、何伟、童峰、卢荣富、冯万健

展开 >

作者单位：

厦门大学海洋与地球学院,福建厦门 361005

导航与位置服务技术国家地方联合工程研究中心(厦门大学),福建厦门 361005

厦门亿联网络技术股份有限公司,福建厦门 361015

关键词：

双麦麦克风阵列波束形成谱减角度压制比

基金：

上海市科委科技创新行动计划厦门市海洋产业项目

项目编号：

21DZ120550222CZB012HJ13

出版年：

2024

DOI：

10.6043/j.issn.0438-0479.202303020

厦门大学学报(自然科学版)

厦门大学

厦门大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.449

ISSN：0438-0479

年,卷(期)：2024.63(2)

参考文献量20