An Attention-Based Model for Time-Domain Frequency Speech Enhancement
To address the phase mismatch problem of single-channel speech enhancement techniques in the frequency domain,a joint time-domain and frequency-domain speech enhancement algorithm is proposed to jointly optimise the learning targets in different domains during the training phase.An attention mechanism is added to simulate human auditory perceptual characteristics to enhance the model's ability to suppress noisy signals.It also uses expanded convolution to widen the perceptual field,enabling the fusion of more input layer information and the effective extraction of local features in the time and frequency domains.To enhance the speech enhancement performance,the joint time-domain and frequency-domain loss functions are optimised for different domain learning.To validate the effectiveness of the proposed method,extensive experiments are conducted on the dataset VoiceBank using residual time convolution as the baseline model,and the experimental structure shows better enhancement than using a single baseline model in the time or frequency domain.The perceptual speech quality(PESQ)after denoising was 3.06 and the signal distortion ratio(SI-SDR)was 20.00.