1686712896 - Meta-Guide.com

#ismpgde

**TalkSHOW Generates Realistic Motions Using Speech as Input**

TalkSHOW, a project presented at CVPR 2023, aims to automate realistic 3D human animation using speech as input. The process generates a synchronized combination of body motion, facial expressions, and hand gestures. The challenge has been the scarcity of training data that pairs holistic 3D body meshes with synchronous speech recordings. To overcome this, the team developed SHOW (Synchronous Holistic Optimization in the Wild), which captures upper body movement, including face and hands, from regular videos, producing a unique dataset for training. TalkSHOW uses different models for the face and body due to their disparate motion characteristics. An encoder-decoder architecture is used for the face, while a VQ-VAE framework is applied for body motions. The model generates realistic and diverse body motion, adjusts to different motion styles, and can adapt to speech from unseen characters, foreign languages, and songs without any fine-tuning.