Skip to content

GeneFace: Generalized and High-Fidelity 3D Talking Face Synthesis; ICLR 2023; Official code

License

Notifications You must be signed in to change notification settings

yerfor/GeneFace

Repository files navigation

GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis | ICLR'23

Zhenhui Ye, Ziyue Jiang, Yi Ren, Jinglin Liu, Jinzheng He, Zhou Zhao | Zhejiang University, ByteDance

arXiv| GitHub Stars | visitors | downloads | 中文文档

This repository is the official PyTorch implementation of our ICLR-2023 paper, in which we propose GeneFace for generalized and high-fidelity audio-driven talking face generation. The inference pipeline is as follows:



Our GeneFace achieves better lip synchronization and expressiveness to out-of-domain audios. Watch this video for a clear lip-sync comparison against previous NeRF-based methods. You can also visit our project page for more details.

🔥MimicTalk Released

We have released the code of MimicTalk (https://github.com/yerfor/MimicTalk/), which is a SOTA NeRF-based person-specific talking face method and achieves better visual quality and enables talking style control.

GeneFace++ Released

We have released the code of GeneFace++ (https://github.com/yerfor/GeneFacePlusPlus/), which is a upgraded version of GeneFace and achieves better lip-sync, video qaulity, and system efficiency.

Update:

  • 2023.3.16 We release a big update in this release, a video demo is here. including: 1) RAD-NeRF-based renderer, which could infer in real-time and be trained in 10 hours. 2) pytorch-based deep3d_reconstruction module, which is easier to install and is 8x faster than the previous TF-based version. 3) pitch-aware audio2motion module which could generate more lip-sync landmark. 4) fix some bugs that cause large memory usage. 5) We will upload the paper about this update soon.
  • 2023.2.22 We release a 1 minute-long demo video, in which GeneFace is driven by a Chinese song generated by DiffSinger.
  • 2023.2.20 We release a stable 3D landmark post-processing strategy in inference/nerfs/lm3d_nerf_infer.py, which improve the stability and quality of the final results by a large margin.

Quick Started!

We provide pre-trained models and processed datasets of GeneFace in this release to enable a quick start. In the following, we show how to infer the pre-trained models in 4 steps. If you want to train GeneFace on your own target person video, please reach to the following sections (Prepare Environments, Prepare Datasets, and Train Models).

  • Step1. Create a new python env named geneface following the guide in docs/prepare_env/install_guide.md.

  • Step2. Download the lrs3.zip and May.zip in the release and unzip it into the checkpoints directory.

  • Step3. Process the dataset of May.mp4 following the guide in docs/process_data/process_target_person_video.md. Then you can see a output file named data/binary/videos/May/trainval_dataset.npy.

After the above steps, the structure of your checkpoints and data directory should look like this:

> checkpoints
    > lrs3
        > lm3d_vae_sync
        > syncnet
    > May
        > lm3d_postnet_sync
        > lm3d_radnerf
        > lm3d_radnerf_torso
> data
    > binary
        > videos
            > May
                trainval_dataset.npy
  • Step4. Run the scripts below:
bash scripts/infer_postnet.sh
bash scripts/infer_lm3d_radnerf.sh
# bash scripts/infer_radnerf_gui.sh # you can also use GUI provided by RADNeRF

You can find a output video named infer_out/May/pred_video/zozo.mp4.

Prepare Environments

Please follow the steps in docs/prepare_env.

Prepare Datasets

Please follow the steps in docs/process_data.

Train Models

Please follow the steps in docs/train_models.

Train GeneFace on other target person videos

Apart from the May.mp4 provided in this repo, we also provide 8 target person videos that were used in our experiments. You can download them at this link. To train on a new video named <video_id>.mp4, you should place it into the data/raw/videos/ directory, then create a new folder at egs/datasets/videos/<video_id> and edit config files, according to the provided example folder egs/datasets/videos/May.

You can also record your own video and train a unique GeneFace model for yourself!

Citation

@article{ye2023geneface,
  title={GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis},
  author={Ye, Zhenhui and Jiang, Ziyue and Ren, Yi and Liu, Jinglin and He, Jinzheng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2301.13430},
  year={2023}
}

Acknowledgements

Our codes are based on the following repos: