Semantic and prosodic feature embedding method in emotional speech synthesis
Based on the problem that the current emotional speech synthesis method easily ignores the text semantic information,BERT pre-training model is introduced into the text encoder to assist the encoder to capture the text semantic features,and an embedding method of semantic and prosodic features is proposed.The lack of Myanmar language emotional corpus makes it difficult for the model to synthesize high-quality e-motional speech,therefore,this paper explores the training method of Myanmar language emotional speech synthesis model by fine-tuning the parameters of each network module.The experiment results show that the feature embedding method and training method proposed in this paper can still synthesize high-quality emo-tional speech in the absence of emotional corpus,with an average emotional opinion score of 4.16 and 4.18,respectively.