基于音频匹配的藏语驱动视觉语音合成算法研究

Research on Tibetan Driven Visual Speech Synthesis Algorithm Based on Audio Matching

韩西 ¹梁凯 ¹岳宇¹

扫码查看

作者信息

1. 甘孜州科技信息研究所,四川康定 626000
折叠

摘要

为解决唇部轮廓检测精度较低、视觉语音合成效果不好的问题,提出了基于音频匹配的藏语驱动视觉语音合成算法.该算法从藏语驱动视觉语音信号中提取短时能量和过零率,并建立语音信号的短时自相关函数.首先,提取语音信号中的特征信息,以此获得藏语语音信号的基音轨迹,即音频特征;其次,建立了唇部时空分析模型,分析唇部轮廓在发音过程中变化趋势,采用主成分分析法提取唇部轮廓特征;最后,通过输入输出隐马尔可夫模型获取音频特征与唇部轮廓特征之间的关联,在音频匹配的基础上合成藏语驱动视觉语音.实验结果表明,该方法具有较高的唇部轮廓检测精度,视觉语音合成效果较好.

Abstract

In order to solve the problems of low lip contour detection accuracy and poor visual speech synthesis effect,a Tibetan-driven visual speech synthesis algorithm based on audio matching is proposed.This algorithm extracts short-term energy and short-term zero-crossing rate from Tibetan-language-driven visual speech signal,establishes short-term autocorrelation function of speech signal,and extracts feature information in speech signal,so as to obtain the pitch track of Tibetan speech signal.Secondly,the temporal and spatial analysis model of lip is established to analyze the changing trend of lip contour in the pronunciation process,and the feature of lip contour is extracted by principal component analysis.Finally,the correlation between audio features and lip contour features is obtained through the input-output hidden Markov model,and Tibetan-driven visual speech is synthesized on the basis of audio matching.Experimental results show that the proposed method has high lip contour detection accuracy and good visual speech synthesis effect.

关键词

音频匹配/短时自相关函数/时空分析模型/主成分分析法/视觉语音合成

Key words

audio matching/short time autocorrelation function/spatiotemporal analysis model/principal component analysis method/visual speech synthesis

引用本文复制引用

基金项目

四川省科技计划基金资助项目(2021YFG0138)

出版年

2024

吉林大学学报(信息科学版)

吉林大学

吉林大学学报(信息科学版)

CSTPCD

影响因子：0.607

ISSN：1671-5896

参考文献量15

段落导航