#zerrinyumakcom
The paper introduces **FaceXHuBERT**, a novel speech-driven 3D facial animation generation method that can capture subtle cues like identity, emotion, and hesitation in speech, with high tolerance to background noise and multiple speakers. Using the pretrained HuBERT model for self-supervised learning, it incorporates both lexical and non-lexical audio information without a large lexicon and is guided in training by binary emotion conditions and speaker identity. This approach addresses issues with data scarcity, inaccurate lip-syncing, expressivity, personalization, and generalizability. Extensive evaluations and user studies demonstrate that FaceXHuBERT produces superior animations 78% of the time compared to current methods and operates 4 times faster by eliminating complex sequential models like transformers.