Skip to content

[CVPR 2024 Highlight] ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

License

Notifications You must be signed in to change notification settings

ubc-vision/vivid123

Repository files navigation

ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models

This repository is a reference implementation for ViVid-1-to-3. It combines video diffusion with novel-view synthesis diffusion models for increased pose and appearace consistency.

[arXiv], [project page]

Requirements

pip install torch "diffusers==0.24" transformers accelerate einops kornia imageio[ffmpeg] opencv-python pydantic scikit-image lpips

Run single generation task

Put the reference image to $IMAGE_PATH, and set the input_image_path in scripts/task_example.yaml to it. Then run

python run_generation.py --task_yaml_path=scripts/task_example.yaml

Run batch generation tasks

We have supported running batch generation tasks on both PC and SLURM clusters.

Prepare batch generation config yaml file

We tested our method on 100 GSO objects. The list of the objects is in scripts/gso_metadata_object_prompt_100.csv, along with our labeled text prompts if you would like to test prompt-based generation yourself. We have rendered the 100 objects beforehand. It can be downloaded here. You can decompress the content into gso-100. Then simply run the following line to prepare a batch generation job on a PC:

python -m scripts.job_config_yaml_generation 

Or run the following line to prepare a batch generation job on a SLURM cluster, which will move temporary stuff to $SLURM_TMPDIR of your cluster:

python -m scripts.job_config_yaml_generation --run_on_slurm

All the yaml files will be generated in a new folder called tasks_gso.

If you want to run customized batch generation, simply add an entry in the job_specs list in the beginning of scripts/job_config_yaml_generation.py and run it with the same bash command. An example has been commented out in it.

Batch generation

For batch generation, run

python run_batch_generation.py --task_yamls_dir=tasks_gso --dataset_dir=gso-100 --output_dir=outputs --obj_csv_file=scripts/gso_metadata_object_prompt_100.csv

Tips for scheduling batch generation on SLURM clusters

It takes about 1min30s to run one generation on a v100 gpu. If the number of generations is too large for each job you can schedule on a SLURM cluster, you can split the dataset for each job using the --run_from_obj_index and --run_to_obj_index options. For example

python run_batch_generation.py --task_yamls_dir=tasks_gso --dataset_dir=gso-100 --output_dir=outputs --obj_csv_file=scripts/gso_metadata_object_prompt_100.csv --run_from_obj_index=0 --run_to_obj_index=50

Run evaluation

Get metrics for each object

To run evaluation for a batch generation, put the experiments you want to evaluate in the eval_specs list in run_evaluation.py. Make sure the exp_name key has the same value as that of your batch generation. Also, you should modify the expdir and savedir in run_evaluation.py. Suppose you want to run the $EXP_ID-th experiment in the list, then do the following:

python run_evaluation.py --exp_id $EXP_ID

After the evaluation is run, intermediate results on PSNR, SSIM, LPIPS, FOR_8, FOR_16 for each object will be put to savedir.

Get stats for this experiment

Finally, you can use run_calculate_stats.py to get the PSNR, SSIM, LPIPS, FOR_8, FOR_16 stats for this experiment on your whole dataset. Make sure to modify the psnr_save_dir, lpips_save_dir, ssim_save_dir, for_8_save_dir, for_16_save_dir in run_calculate_stats.py to match the folder storing the intermediate results from the last step.

python run_calculate_stats.py

Acknowledgement

This repo is based on the Huggingface community implementation and converted weights of Zero-1-to-3, as well as the Huggingface community text-to-video model Zeroscope v2. Thanks for their awesome works.

Citation

If you use this code in your research, please cite our paper:

@inproceedings{kwak2024vivid,
  title={Vivid-1-to-3: Novel view synthesis with video diffusion models},
  author={Kwak, Jeong-gi and Dong, Erqun and Jin, Yuhe and Ko, Hanseok and Mahajan, Shweta and Yi, Kwang Moo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={6775--6785},
  year={2024}
}