Reverberation-aware microphone array speech-enhancement algorithm based on deep-learning
[Objective]The technique of microphone array has been extensively applied for enhancing speech by means of the exploration of spatial information provided by multiple microphone channel.However,due to diverse reverberation characteristics produced by different sizes,different boundary materials and different reflectors,the speech enhance performance of microphone array are deteriorated significantly.In recent years,the deep-learning optimized microphone array signal processing has been investigated to remedy the problem caused by reverberation,which endures the data dependence and thus cannot adapt to the reverberation scene that is excluded from the training data.In this paper,a novel reverberation-aware(RA)microphone array speech enhancement algorithm is proposed to first obtain the reverberant feature and then design a deep-learning model to decouple the negative impact of environments,thus facilitating environment adaptive microphone array speech enhancement under diverse reverberant scenarios.[Methods]The proposed RA microphone array speech enhancement algorithm consists of training stage and testing stage.Specifically,in the training stage,the simulated reverberant signal is used for obtaining approximate room impulse response(ARIR)by correlating the reverberant signal with its beamforming output.Then,with the clean speech as training target,a RA model is designed by adopting ARIR and the beamformed signal as the training input.Consequently,a diverse room impulse response(RIR)generalized vector(RGV)to generalize the de-reverberation model with respect to RIR as well as the uncontrolled speech can be produced.In the practical testing stage,the practical ARIR is similarly obtained by correlating the received reverberant signal with its beamforming output.Afterward the resulting RGV is used to convolve with the practical ARIR to obtain the coefficients of a post de-reverberation filter,which exerts to remove the reverberation corresponding to ARIR.[Results]Performance of the proposed RA speech enhancement algorithm is quantitatively evaluated through simulations and experiments,in which the classic filter and sum beamforming(FSB)algorithm,weighted prediction error(WPE)algorithm,and DNN-WPE algorithm are chosen as comparative methods.The perceptual evaluation of speech quality(PESQ)scores and the speech-to-reverberation modulation energy ratio(SRMR)serve as evaluation metrics for assessing speech quality.Also,the THCHS-30 dataset is utilized for training and testing.In the case of environment match,those datasets for model training and testing originate from the same room,whereas,in the case of environment mismatch,those datasets for model training and testing originate from different rooms.In the simulation,artificial RIR with different reverberation levels are constructed based on the IMAGE toolbox,and speech signals with different reverberation levels can be generated by convolving the aforementioned original pure corpus with the artificial RIR to simulate the reverberant multichannel received signals of microphone arrays.Simulation results show that,under the condition of environment match,both DNN-WPE and the proposed RA deep learning algorithms outperform the traditional FSB algorithm and WPE algorithm at all reverberation levels.However,in case of environment mismatch,the performance of both DNN-WPE and the proposed RA algorithms worsen.Notably,while the DNN-WPE experiences significant performance degradation in terms of PESQ and SRMR,the proposed RA algorithm continues to exhibit better performance than the traditional FSB algorithm and WPE algorithm do.In the practical experiment,speech data is recorded with reverberation times of 0.25,0.4,and 0.6 s in a reverberation laboratory with adjustable reverberant level.Experimental results reveal that,by comparing the environment mismatch case to the environment match case,the DNN-WPE algorithm demonstrates significant performance degradation,whereas the proposed RA algorithm exhibits much more stability in terms of the PESQ and SRMR.This trend indicates that the proposed RA algorithm outperforms the DNN-WPE method in terms of environmental tolerance,consistently resembling results of the simulation.[Conclusions]Based on the evaluation and comparison results of different algorithms obtained via simulations and practical experiments,the RA microphone array speech-enhancement algorithm proposed in this paper is capable of achieving a satisfactory performance under diverse reverberation environments.In the RA algorithm,ARIR is used as an input to the model,thus somewhat reducing the dependence of the neural network on training data.In future research,we will consider the combination of other models and training methods to explore the potential of ARIR in improving the generalization ability of the model.