[refactor] CogVideoX followups + tiled decoding support #9150

a-r-r-o-w · 2024-08-11T22:38:46Z

What does this PR do?

CogVideoX followups from Add CogVideoX text-to-video generation model #9082
Support for tiled decoding

Code

import gc

import torch
from diffusers import CogVideoXPipeline, CogVideoXDDIMScheduler
from diffusers.utils import export_to_video


def reset_memory():
    gc.collect()
    torch.cuda.empty_cache()
    torch.cuda.reset_accumulated_memory_stats()
    torch.cuda.reset_peak_memory_stats()


def print_memory():
    memory = round(torch.cuda.memory_allocated() / 1024**3, 2)
    max_memory = round(torch.cuda.max_memory_allocated() / 1024**3, 2)
    max_reserved = round(torch.cuda.max_memory_reserved() / 1024**3, 2)
    print(f"{memory=} GB")
    print(f"{max_memory=} GB")
    print(f"{max_reserved=} GB")


prompt = (
    "A panda, dressed in a small, red jacket and a tiny hat, sits on a wooden stool in a serene bamboo forest. "
    "The panda's fluffy paws strum a miniature acoustic guitar, producing soft, melodic tunes. Nearby, a few other "
    "pandas gather, watching curiously and some clapping in rhythm. Sunlight filters through the tall bamboo, "
    "casting a gentle glow on the scene. The panda's face is expressive, showing concentration and joy as it plays. "
    "The background includes a small, flowing stream and vibrant green foliage, enhancing the peaceful and magical "
    "atmosphere of this unique musical performance."
)
pipe = CogVideoXPipeline.from_pretrained("/raid/aryan/CogVideoX-trial", torch_dtype=torch.float16)
pipe.scheduler = CogVideoXDDIMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")

pipe.enable_model_cpu_offload()

reset_memory()
video = pipe(prompt=prompt, num_frames=48, guidance_scale=6, num_inference_steps=50, generator=torch.Generator().manual_seed(42)).frames[0]
print_memory()
export_to_video(video, "output.mp4", fps=8)

pipe.vae.enable_tiling()

reset_memory()
video = pipe(prompt=prompt, num_frames=48, guidance_scale=6, num_inference_steps=50, generator=torch.Generator().manual_seed(42)).frames[0]
print_memory()
export_to_video(video, "output_tiling.mp4", fps=8)

Memory usage:

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  4.51it/s]
Loading pipeline components...:  40%|████████████████████████████████████████████████████████████████████████████▊                                                                                                                   | 2/5 [00:00<00:00,  3.29it/s]The config attributes {'mid_block_add_attention': True, 'sample_size': 256} were passed to AutoencoderKLCogVideoX, but are not expected and will be ignored. Please verify your config.json configuration file.
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:01<00:00,  4.41it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [02:44<00:00,  3.28s/it]

# CPU offloading, normal VAE decoding
memory=0.01 GB
max_memory=12.39 GB
max_reserved=20.39 GB
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 50/50 [02:35<00:00,  3.11s/it]

# CPU offloading, tiled VAE decoding
memory=0.01 GB
max_memory=10.81 GB
max_reserved=10.83 GB

Results:

Normal
output.webm
Tiled
output_tiling.webm

Note that you will need to install accelerate:main from source for this to work and get the expected numbers I'm getting above. If you're using the stable version of accelerate, you might see an addition 5-7GB usage.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@DN6 @sayakpaul @zRzRzRzRzRzRzR

HuggingFaceDocBuilderDev · 2024-08-11T22:44:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

a-r-r-o-w · 2024-08-12T08:55:10Z

Something interesting/fishy going on with enable_model_cpu_offload. It takes about 1 min 30 seconds when cpu offloading is disabled but ~3 mins with it enabled (so about a 2x slowdown). I assume that the transformer, once in the denoise loop, would not be moving from cpu to cuda and back at every step. Any ideas why this might be happening @sayakpaul?

sayakpaul

Very nice! I have left some questions. LMK if they are unclear.

Additionally, let's include a note on the memory savings due to tiling in the docs?

docs/source/en/api/pipelines/cogvideox.md

src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py

sayakpaul · 2024-08-12T09:13:53Z

src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py


    def _set_gradient_checkpointing(self, module, value=False):
        if isinstance(module, (CogVideoXEncoder3D, CogVideoXDecoder3D)):
            module.gradient_checkpointing = value

-    def clear_fake_context_parallel_cache(self):
+    def _clear_fake_context_parallel_cache(self):


src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py

src/diffusers/pipelines/cogvideo/pipeline_cogvideox.py

src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py

src/diffusers/pipelines/cogvideo/pipeline_cogvideox.py

zRzRzRzRzRzRzR · 2024-08-12T12:09:56Z

Something interesting/fishy going on with enable_model_cpu_offload. It takes about 1 min 30 seconds when cpu offloading is disabled but ~3 mins with it enabled (so about a 2x slowdown). I assume that the transformer, once in the denoise loop, would not be moving from cpu to cuda and back at every step. Any ideas why this might be happening @sayakpaul?enable_model_cpu_offload 有一些有趣/可疑的事情正在发生。当禁用 CPU 卸载时大约需要 1 分 30 秒，但启用时大约需要 3 分钟（大约慢了 2 倍）。我假设变压器一旦进入去噪循环，就不会在每一步都从 CPU 移动到 CUDA 再移回来。有什么想法为什么会发生这种情况 @sayakpaul？

I used this method, and the result is also 90 seconds. I didn’t replicate the issue you’re mentioning, so I need to check further. This shouldn’t be an issue.

a-r-r-o-w · 2024-08-12T14:17:08Z

@sayakpaul I've added a few explanations here. Could you please review again?

tests/pipelines/cogvideox/test_cogvideox.py

src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py

a-r-r-o-w · 2024-08-12T15:24:29Z

I think it would be good to add dynamic positional embeddings as well, to test the generalization capabilities of CogVideoX and remove the 48 frame, 480 height, 720 width limit. I have a POC almost ready for the same. Should I push here and share results in a while, or do it in a separate PR? Shouldn't break anything existing IMO @sayakpaul

sayakpaul · 2024-08-12T15:29:51Z

Let’s do separate PR

a-r-r-o-w · 2024-08-12T22:17:16Z

I've pushed the code to https://github.com/huggingface/diffusers/tree/cogvideox-dynamic-pos-embeds for possibly future reference. After further testing with number of frames greater than 49 and different resolutions, I think the results are not convincing enough to support it. I think best not to add it at the moment

src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py

DN6

LGTM 👍🏽

a-r-r-o-w · 2024-08-13T14:22:07Z

@sayakpaul, could you check the note about memory optimizations here? If it looks good, we can merge this I think.

cc @zRzRzRzRzRzRzR for visibility

Edit: By the way, accelerate must be installed from source to replicate the memory numbers here. Until the next accelerate release, should we add a note saying the same?

sayakpaul

LGTM, thanks for the memory optims. Sleek!

* refactor context parallel cache; update torch compile time benchmark * add tiling support * make style * remove num_frames % 8 == 0 requirement * update default num_frames to original value * add explanations + refactor * update torch compile example * update docs * update * clean up if-statements * address review comments * add test for vae tiling * update docs * update docs * update docstrings * add modeling test for cogvideox transformer * make style

refactor context parallel cache; update torch compile time benchmark

06b1a97

a-r-r-o-w changed the title ~~[refactor] CogVideoX followups + tiled Decoding support~~ [refactor] CogVideoX followups + tiled decoding support Aug 12, 2024

a-r-r-o-w added 3 commits August 12, 2024 10:55

add tiling support

d962677

make style

2b923a8

Merge branch 'main' into cogvideox-followup

56d7506

sayakpaul requested a review from DN6 August 12, 2024 08:59

sayakpaul reviewed Aug 12, 2024

View reviewed changes

remove num_frames % 8 == 0 requirement

e54db72

a-r-r-o-w commented Aug 12, 2024

View reviewed changes

src/diffusers/pipelines/cogvideo/pipeline_cogvideox.py Show resolved Hide resolved

update default num_frames to original value

84d6416

a-r-r-o-w added 4 commits August 12, 2024 16:07

add explanations + refactor

1ed1cfb

update torch compile example

0094792

update docs

50fa1d0

update

9de509d

a-r-r-o-w marked this pull request as ready for review August 12, 2024 14:15

a-r-r-o-w commented Aug 12, 2024

View reviewed changes

tests/pipelines/cogvideox/test_cogvideox.py Show resolved Hide resolved

clean up if-statements

7f63ee2

a-r-r-o-w commented Aug 12, 2024

View reviewed changes

src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py Outdated Show resolved Hide resolved

DN6 reviewed Aug 13, 2024

View reviewed changes

src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py Outdated Show resolved Hide resolved

DN6 reviewed Aug 13, 2024

View reviewed changes

src/diffusers/models/autoencoders/autoencoder_kl_cogvideox.py Outdated Show resolved Hide resolved

DN6 approved these changes Aug 13, 2024

View reviewed changes

a-r-r-o-w added 2 commits August 13, 2024 14:58

Merge branch 'main' into cogvideox-followup

29f139e

address review comments

76dda5e

a-r-r-o-w added 3 commits August 13, 2024 15:51

add test for vae tiling

ea86c32

update docs

878890d

update docs

a40e8d2

sayakpaul approved these changes Aug 13, 2024

View reviewed changes

a-r-r-o-w added 3 commits August 13, 2024 23:04

update docstrings

836f5d0

add modeling test for cogvideox transformer

661f7b8

make style

1b6c527

a-r-r-o-w merged commit a85b34e into main Aug 13, 2024
18 checks passed

a-r-r-o-w deleted the cogvideox-followup branch August 13, 2024 22:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[refactor] CogVideoX followups + tiled decoding support #9150

[refactor] CogVideoX followups + tiled decoding support #9150

a-r-r-o-w commented Aug 11, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 11, 2024

a-r-r-o-w commented Aug 12, 2024

sayakpaul left a comment

sayakpaul Aug 12, 2024

zRzRzRzRzRzRzR commented Aug 12, 2024

a-r-r-o-w commented Aug 12, 2024

a-r-r-o-w commented Aug 12, 2024

sayakpaul commented Aug 12, 2024

a-r-r-o-w commented Aug 12, 2024

DN6 left a comment

a-r-r-o-w commented Aug 13, 2024 •

edited

Loading

sayakpaul left a comment

[refactor] CogVideoX followups + tiled decoding support #9150

[refactor] CogVideoX followups + tiled decoding support #9150

Conversation

a-r-r-o-w commented Aug 11, 2024 • edited Loading

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Aug 11, 2024

a-r-r-o-w commented Aug 12, 2024

sayakpaul left a comment

Choose a reason for hiding this comment

sayakpaul Aug 12, 2024

Choose a reason for hiding this comment

zRzRzRzRzRzRzR commented Aug 12, 2024

a-r-r-o-w commented Aug 12, 2024

a-r-r-o-w commented Aug 12, 2024

sayakpaul commented Aug 12, 2024

a-r-r-o-w commented Aug 12, 2024

DN6 left a comment

Choose a reason for hiding this comment

a-r-r-o-w commented Aug 13, 2024 • edited Loading

sayakpaul left a comment

Choose a reason for hiding this comment

a-r-r-o-w commented Aug 11, 2024 •

edited

Loading

a-r-r-o-w commented Aug 13, 2024 •

edited

Loading