Neural Networks2022,Vol.15111.DOI:10.1016/j.neunet.2022.03.041

Evaluation of text-to-gesture generation model using convolutional neural network

Asakawa, Eiichi Kaneko, Naoshi Hasegawa, Dai Shirakawa, Shinichi
Neural Networks2022,Vol.15111.DOI:10.1016/j.neunet.2022.03.041

Evaluation of text-to-gesture generation model using convolutional neural network

Asakawa, Eiichi 1Kaneko, Naoshi 2Hasegawa, Dai 3Shirakawa, Shinichi1
扫码查看

作者信息

  • 1. Hodogaya Ku,Yokohama Natl Univ
  • 2. Chuo Ku,Aoyama Gakuin Univ
  • 3. Chuo Ku,Hokkai Gakuen Univ
  • 折叠

Abstract

Conversational gestures have a crucial role in realizing natural interactions with virtual agents and robots. Data-driven approaches, such as deep learning and machine learning, are promising in constructing the gesture generation model, which automatically provides the gesture motion for speech or spoken texts. This study experimentally analyzes a deep learning-based gesture generation model from spoken text using a convolutional neural network. The proposed model takes a sequence of spoken words as the input and outputs a sequence of 2D joint coordinates representing the conversational gesture motion. We prepare a dataset consisting of gesture motions and spoken texts by adding text information to an existing dataset and train the models using specific speaker's data. The quality of the generated gestures is compared with those from an existing speech-to-gesture generation model through a user perceptual study. The subjective evaluation shows that the model performance is comparable or superior to those by the existing speech-to-gesture generation model. In addition, we investigate the importance of data cleansing and loss function selection in the text-to-gesture generation model. We further examine the model transferability between speakers. The experimental results demonstrate successful model transferability of the proposed model. Finally, we show that the text-to-gesture generation model can produce good quality gestures even when using a transformer architecture.(c) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Key words

Gesture generation/Spoken text/Convolutional neural network/Transformer architecture/Deep learning

引用本文复制引用

出版年

2022
Neural Networks

Neural Networks

EISCI
ISSN:0893-6080
被引量1
参考文献量36
段落导航相关论文