首页|基于音频匹配的藏语驱动视觉语音合成算法研究

基于音频匹配的藏语驱动视觉语音合成算法研究

扫码查看
为解决唇部轮廓检测精度较低、视觉语音合成效果不好的问题,提出了基于音频匹配的藏语驱动视觉语音合成算法。该算法从藏语驱动视觉语音信号中提取短时能量和过零率,并建立语音信号的短时自相关函数。首先,提取语音信号中的特征信息,以此获得藏语语音信号的基音轨迹,即音频特征;其次,建立了唇部时空分析模型,分析唇部轮廓在发音过程中变化趋势,采用主成分分析法提取唇部轮廓特征;最后,通过输入输出隐马尔可夫模型获取音频特征与唇部轮廓特征之间的关联,在音频匹配的基础上合成藏语驱动视觉语音。实验结果表明,该方法具有较高的唇部轮廓检测精度,视觉语音合成效果较好。
Research on Tibetan Driven Visual Speech Synthesis Algorithm Based on Audio Matching
In order to solve the problems of low lip contour detection accuracy and poor visual speech synthesis effect,a Tibetan-driven visual speech synthesis algorithm based on audio matching is proposed.This algorithm extracts short-term energy and short-term zero-crossing rate from Tibetan-language-driven visual speech signal,establishes short-term autocorrelation function of speech signal,and extracts feature information in speech signal,so as to obtain the pitch track of Tibetan speech signal.Secondly,the temporal and spatial analysis model of lip is established to analyze the changing trend of lip contour in the pronunciation process,and the feature of lip contour is extracted by principal component analysis.Finally,the correlation between audio features and lip contour features is obtained through the input-output hidden Markov model,and Tibetan-driven visual speech is synthesized on the basis of audio matching.Experimental results show that the proposed method has high lip contour detection accuracy and good visual speech synthesis effect.

audio matchingshort time autocorrelation functionspatiotemporal analysis modelprincipal component analysis methodvisual speech synthesis

韩西、梁凯、岳宇

展开 >

甘孜州科技信息研究所,四川康定 626000

音频匹配 短时自相关函数 时空分析模型 主成分分析法 视觉语音合成

四川省科技计划基金资助项目

2021YFG0138

2024

吉林大学学报(信息科学版)
吉林大学

吉林大学学报(信息科学版)

CSTPCD
影响因子:0.607
ISSN:1671-5896
年,卷(期):2024.42(3)
  • 15