Environment adaptive dual-microphone speech enhancement based on direction mitigation ratio spectral subtraction
[Objective]The voice front-end plays an important role in collecting and ensuring the quality of speech signals so that different types of speech processing can be supported.The increasing application of small size intelligent terminals in highly diverse application scenarios brings significant challenges to the speech enhancement performance of the voice front-ends under complicated reverberant and noisy environments.As the beam directivity of microphone array beamforming algorithm depends highly on microphone array sizes and element numbers,dual-microphones that are popularly adopted in small size intelligent terminals endure substantial performance degradation.In this paper,an environment adaptive dual-microphone speech enhancement algorithm based on direction mitigation ratio spectral subtraction is proposed to improve the speech-enhancement performance of dual-microphone array under different environments.[Methods]First,a least-squares(LS)driven filter-and-sum(FSB)dual-microphone beamformer is designed to yield the preliminary speech enhancement with its signal beam and noise beam aiming at desired directions and undesired directions,respectively.Then,the noise reference collected by the noise beam is used to remove residual noises that are contained in the beamforming enhanced speech by the way of spectral subtraction.Specifically,a direction mitigation ratio(DMR)parameter is defined to carry the environmental information,which is calculated in each frame to determine the spectral subtraction threshold.Thus,by updating the DMR in real time,the spectral subtraction processing between the enhanced speech and noise reference is adaptively controlled to achieve environmental prediction and achieve improved effects of residual noise removing.[Results]For the purpose of performance evaluation and comparison,practical experiments are carried out in anechoic laboratory,in which speakers located in different directions are used as artificial noise resources to generate environmental noises with different signal-to-noise ratios(SNRs).Experimental data collected by the microphone array is used to generate reverberated signals with different reverberation levels using the IMAGE reverberation model to verify the impact of environmental changes on experimental results.In these practical experiments,segment signal-to-noise ratio(segSNR)and perceptual evaluation of speech quality(PESQ)score are adopted as quantitative evaluation metrics.Experimental results under different noisy and reverberant environments reveal that the proposed algorithm can effectively remove residual noises of FSB,and that the waveform is the closest to the pure speech.In terms of segSNR,the algorithm proposed herein outperforms FSB under different signal-to-noise ratios,noise types,noise angles,and reverberation times.Compared to FSB and the fixed spectral subtraction threshold method,the proposed method achieves an average segSNR improvement of 2.97 and 2.75 dB,respectively.In terms of the PESQ score,we also obtain the best results,indicating better subjective listening feeling.Under a reverberation time of 0.2 s,the proposed algorithm yields an average PESQ improvement of 0.76 points at an SNR range of-5 to 10 dB,corresponding to an average improvement of 0.36 points and 0.16 points compared to the FSB and the fixed spectral subtraction method respectively.Meanwhile,the capability of the DMR parameter in characterizing environmental patterns has also been verified,thus offering an adaptive adjustment mechanism for the proposed method under different environment.[Conclusions]Experimental results and analyses show that,by combining the traditional FSB beamforming with the spectral subtraction processing,the proposed algorithm is capable of achieving promising speech enhancement performance under different noisy and reverberant backgrounds.Incidentally,the adverse impact of backgrounds is addressed via the newly defined DMR parameter to enable environmental adaptability.Note that,compared with the pure FSB algorithm,the proposed algorithm improves the residual noise removing effect via beamforming and then performing opposite spectral subtraction with DMR determined environmental adaptive threshold.Compared to the FSB combined with fixed spectral subtraction,the proposed algorithm reduces the speech distortion caused by negative effects of spectral subtraction,and achieves better residual noise removal effects under different environments.Moreover,with low computational complexity and no requirement of parameter tuning,the hardware-implementation convenience of the proposed algorithm in dual-microphone front-end secures the potential of being applied in research and development of practical small size intelligent terminal products.
dual-microphonemicrophone arraybeamformingspectral subtractiondirection mitigation ratio