Home >

news Help

Publication Information


Title
Japanese: 
English:Co-speech Gesture Generation with Variational Auto Encoder 
Author
Japanese: 賈 宸一, 篠田 浩一.  
English: Shinichi Ka, Koichi Shinoda.  
Language English 
Journal/Book name
Japanese: 
English:Lecture Notes in Computer Science on MultiMedia Modeling (MMM 2024) 
Volume, Number, Page vol. 14556       
Published date Jan. 28, 2024 
Publisher
Japanese: 
English:Springer, Cham 
Conference name
Japanese: 
English:Multimedia Modeling(MMM) 2024 
Conference site
Japanese: 
English:Amsterdam 
Official URL https://mmm2024.org/index.html
 
DOI https://doi.org/10.1007/978-3-031-53311-2_12
Abstract The research field of generating natural gestures from speech input is called co-speech gesture generation. Co-speech generation methods should suffice two requirements: fidelity and diversity. Several previous researches have utilized deterministic methods to establish a one-to-one mapping between speech and motion to achieve fidelity to speech, but the variety of gestures produced is limited. Other methods generate gestures probabilistically to make them various, but they often lack fidelity to the speech. To overcome these limitations, we propose Speaker-aware Audio2Gesture (SA2G) that uses a variational autoencoder (VAE) with the input of randomized speaker-aware features, an extension of the previously proposed A2G. By using ST-GCNs as encoders and controlling the variance for randomization, it can generate gestures faithful to speech content, which also have a large variety. In our evaluation on TED datasets, it improves the fidelity of the generated gestures from the baseline by 85.4, while increasing the Multimodality by 9.0×10^(-3).

©2007 Institute of Science Tokyo All rights reserved.