Thai Speech Synthesis Based on Cross-language Transfer Learning and Joint Training
With the rapid development of deep learning and neural network,end-to-end speech synthesis system based on deep neural network has become the mainstream because of its excellent performance.However,in recent years,there are not enough researches on Thai speech synthesis,which is mainly due to the scarcity of large-scale Thai datasets and the special spelling of the language.This paper studies Thai speech synthesis based on the FastSpeech2 acoustic model and StyleMelGAN vocoder under the premise of low resources.Aiming at the problems existing in the baseline system,three improvement methods are proposed to further improve the quality of Thai synthesized speech.(1)Under the guidance of Thai language experts and combined with rele-vant knowledge of Thai linguistics,the Thai G2P model is designed to deal with the special spelling in Thai text.(2)According to the phonemes represented by the international phonetic alphabet converted by the designed Thai G2P model,languages with simi-lar phonemes input units and rich data sets are selected for cross-language transfer learning to solve the problem of insufficient Thai training data.(3)The joint training method of FastSpeech2 and StyleMelGAN vocoder is used to solve the problem of acous-tic feature mismatch.In order to verify the effectiveness of the proposed methods,this paper measures the attention alignment map,objective evaluation MCD and subjective evaluation MOS score.Experimental results show that using the Thai G2P model designed in this paper can obtain better alignment effect and thus more accurate phoneme duration,and the system using the"Thai G2P model designed in this paper+joint training+transfer learning"method has the best speech synthesis quality,and the MCD and MOS scores of the synthesized speech are 7.43±0.82 and 4.53 points,which are significantly better than the 9.47±0.54 and 1.14 points of the baseline system.
Speech synthesisLow resourceThai G2P modelTransfer learningJoint training