基于语音语义引导的语音分割方法

Speech segmentation method based on speech semantic guidance

扫码查看

原文链接

维普
万方数据

中文摘要：[目的]语音分割旨在将音频流或者较长的音频分割为短的音频片段,是语音翻译任务中的一个必要步骤.恰当的分割使音频段具有完整的语义,从而使语音翻译模型更关注句子完整的上下文信息,解码得到更优的翻译结果.[方法]本文提出一种基于语音语义引导的语音分割方法,使用基于HuBERT的帧分类器对音频帧进行分类,得到每个音频帧是否为语音帧的概率,并使用ipDAC算法对音频进行递归切割,从而实现对音频的分割.[结果]本文方法在Must-C英语-越南语翻译数据集上的BLEU值上相较已有方法取得了 0.6个百分点的提升.[结论]通过对比不同的分割方法对模型性能的影响,证明所提方法能有效减少语音翻译模型在解码时的性能损失.

外文摘要：[Objective]Speech segmentation aims to split audio stream or longer audio into shorter segments and constitutes a crucial step in speech translation tasks.Proper segmentation ensures that these audio segments maintain their complete semantics,thus allowing the speech translation model to focus on the entire contextual information within each sentence,and thereby producing improved translation results.[Methods]Herein we propose a speech segmentation method based on phonetic semantic guidance,and employ a HuBERT-based frame classifier to categorize audio frames.Also we determine the likelihood of each frame being speech or non-speech,and use the ipDAC algorithm to recursively partition the audio to achieve desired segmentation.[Results]Compared to those existing methods,the proposed method has achieved a improvement of 0.6 percent points in BLEU score on the Must-C En-Vi translation dataset.[Conclusions]Through a comparative analysis of various segmentation techniques,we demonstrate that the proposed approach effectively reduces the performance degradation in the speech translation model during the decoding process.

外文关键词：

speech translationspeech segmentationHuBERT pre-train model

作者：

高盛祥、杨尚龙、余正涛、董凌、周国江

展开 >

作者单位：

昆明理工大学信息工程与自动化学院,云南昆明 650500

昆明理工大学云南省人工智能重点实验室,云南昆明 650500

关键词：

语音翻译语音分割 HuBERT预训练模型

出版年：

2024

DOI：

10.6043/j.issn.0438-0479.202312022

厦门大学学报(自然科学版)

厦门大学

厦门大学学报(自然科学版)

CSTPCD北大核心

影响因子：0.449

ISSN：0438-0479

年,卷(期)：2024.63(6)