Audio-Driven Gaussian Avatars with Generalizable Capability | Undergraduate Research Opportunities Program

Project Description

Generating animations of photorealistic 3D head avatars is important for many applications like digital humans, movie production, and immersive telepresence. To advance this task, Gaussian Splatting, a state-of-the-art neural scene representation technique, was used in recent papers. Gaussian Splatting offers key advantages such as high-fidelity rendering and real-time performance (high FPS), making it an ideal tool for creating high-quality avatars. A key challenge in audio-driven Gaussian avatars is their reliance on person-specific training, requiring multi-view images of one single person. This severely limits scalability and generalization ability. In the project, we aim to improve the generalization ability of audio-driven Gaussian avatars, enabling them to generate high-quality animations for unseen individuals using only a single image or sparse-view images as input.

Supervisor

XU Dan

Quota

3

Course type

UROP1100

UROP2100

UROP3100

UROP3200

UROP4100

Applicant's Roles

- Study existing Gaussian Splatting methods to explore their potential for generalizable 3D head avatar generation.
- Conduct research on audio-driven Gaussian Avatars, focusing on improving realism and generalization for unseen individuals using single or sparse-view images.
- Collaborate with PhD students with potential opportunities to contribute to a research paper.

Applicant's Learning Objectives

- Gain a deep understanding of Gaussian Splatting techniques and their applications in 3D head avatar generation.
- Learn how to integrate audio-driven methods with neural scene representations for realistic animations.
- Develop practical experience in coding, experimenting with state-of-the-art models, and conducting research experiments.

Complexity of the project

Moderate