首页|基于多维度注意力机制和复数Conformer的单通道语音增强方法

基于多维度注意力机制和复数Conformer的单通道语音增强方法

扫码查看
为提高被噪声干扰的语音的可理解性和语音质量,针对用于语音增强的深度复数网络对语音复数谱中关键声学特征提取不充分、关联信息建模不合理的问题,提出了基于多维度注意力机制和复数Conformer的单通道语音增强方法(SE-MDACC).在复数U-Net架构下引入复数Conformer,对语音幅度和相位的相关性进行建模;利用多维度注意力机制,构造更加丰富的特征来增强卷积层的表示能力;在残差连接中加入注意力门控机制强化重构语音的细节信息.实验结果显示,相比于深度复数卷积递归网络,SE-MDACC的客观评价指标语音质量感知评估和短时客观可懂度分别提升 15.299%、1.462%,表明SE-MDACC可充分提取语音声学特征并对幅度和相位相关性进行合理建模,有效提升语音质量和可理解性.
Monaural speech enhancement method based on multi-dimensional attention mechanism and complex Conformer
To improve the intelligibility and speech quality of noise-disturbed speech,we propose a monaural speech en-hancement method.This method integrates a multi-dimensional attention mechanism and complex conformer(SE-MDACC)to address issues of insufficient extraction of key acoustic features and unreasonable modeling of correlated information in existing deep complex-valued networks used for speech enhancement.Within the complex U-Net architecture,a complex conformer is introduced to model the correlation between the magnitude and phase of the speech. A multi-dimensional atten-tion mechanism is utilized to construct richer features for enhancing the representation capability of convolutional layers. Ad-ditionally, an attention gating mechanism is incorporated into the residual connection to strengthen the detailed information of the reconstructed speech, ultimately enhancing the performance of the speech enhancement network. Experimental results indicate that, compared to deep complex convolutional recursive networks, the proposed method shows improvements of 15.299% and 1.462% in objective evaluation metrics of speech quality perception and short-time intelligibility, respective-ly. This demonstrates that SE-MDACC can effectively extract acoustic features from speech and reasonably model amplitude and phase correlations, thereby enhancing both speech quality and intelligibility.

deep complex networkacoustic featurescorrelated informationmulti-dimensional attention mechanismspeech enhancement

高盛祥、莫尚斌、余正涛、董凌、王文君

展开 >

昆明理工大学 信息工程与自动化学院,昆明 650500

昆明理工大学 云南省人工智能重点实验室,昆明 650500

云南省融媒体重点实验室,昆明 650500

深度复数网络 声学特征 关联信息 多维度注意力机制 语音增强

国家自然科学基金项目国家自然科学基金项目国家自然科学基金项目云南省重点研发计划项目云南省重点研发计划项目云南省高新技术产业发展项目(2016)云南省科技人才与平台计划项目云南省融媒体重点实验室开放课题

62376111U23A20388U21B2027202303AP140008202103AA080015202105AC160018220225702

2024

重庆邮电大学学报(自然科学版)
重庆邮电大学

重庆邮电大学学报(自然科学版)

CSTPCD北大核心
影响因子:0.66
ISSN:1673-825X
年,卷(期):2024.36(2)
  • 26