[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

tolgacangoz · 2024-08-12T09:25:21Z

Thanks for the opportunity to work on this model!

The Abstract of the paper:

Diffusion models are the de-facto approach for generating high-quality images and videos but learning high-dimensional models remains a formidable task due to computational and optimization challenges. Existing methods often resort to training cascaded models in pixel space, or using a downsampled latent space of a separately trained auto-encoder. In this paper, we introduce Matryoshka Diffusion (MDM), a novel framework for high-resolution image and video synthesis. We propose a diffusion process that denoises inputs at multiple resolutions jointly and uses a NestedUNet architecture where features and parameters for small scale inputs are nested within those of the large scales. In addition, MDM enables a progressive training schedule from lower to higher resolutions which leads to significant improvements in optimization for high-resolution generation. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications. Remarkably, we can train a single pixel-space model at resolutions of up to 1024 × 1024 pixels, demonstrating strong zero shot generalization using the CC12M dataset, which contains only 12 million images. Code and pre-trained checkpoints are released at https://github.com/apple/ml-mdm.

Paper: 🪆Matryoshka Diffusion Models
Repository: https://github.com/apple/ml-mdm
Hugging Face Space: https://huggingface.co/spaces/pcuenq/mdm
License: MIT license

Key takeaways from the paper:

VAE: None; since Matryoshka Diffusion Models work on the (extended) pixel space(s).
Text-encoder: flan-t5-xl
Enables:
1. a multi-resolution loss that greatly improves the convergence speed of high-resolution input denoising.
2. an efficient progressive training schedule, that starts by training a low-resolution diffusion model and gradually adds high-resolution inputs and outputs following a schedule. This speeds up the overall convergence.
MDM allows us to train high-resolution models without resorting to cascaded (Since each model is trained separately, the generation quality can be bottlenecked by the exposure bias (Bengio et al., 2015) from imperfect predictions and several models need to be trained corresponding to different resolutions.) or latent diffusion (This not only increases the complexity of learning but also bounds the generation quality due to the lossy compression process.), and other end-to-end models (without fully considering the innate structure of hierarchical generation, their results lag behind cascaded and latent models.)
Resolution-specific noise schedules are used.
Allocating more computation in the low-resolution feature maps.
MDM has extensive parameter sharing across resolutions.
Authors see that increasing from two resolution levels to three consistently improves the model's convergence. Note that increasing the number of nesting levels brings only negligible costs.
LDM and MDM methods are complementary. It is possible to build MDM on top of autoencoder codes.

TODOs:
✅ The U-Net; in other words, the inner-most structure, nesting_level=0; approximately would be as follows:

UNet2DConditionModel(in_channels=3, out_channels=3, block_out_channels=(256, 512, 768),
		cross_attention_dim=2048, resnet_time_scale_shift='scale_shift',
		down_block_types=('DownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D'),
		up_block_types=('CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'UpBlock2D'),
		ff_act_fn='gelu', transformer_layers_per_block=[0, 1, 5],
		use_linear_projection='no_projection', attention_bias=True,
		norm_type='layer_norm_matryoshka', ff_norm_type='group_norm_matryoshka',
		cross_attention_norm='layer_norm', attention_pre_only=True,
		encoder_hid_dim_type='text_proj', encoder_hid_dim=2048,
		flip_sin_to_cos=False, masked_cross_attention=False,
		micro_conditioning_scale=64, addition_embed_type='matryoshka')

✅ Scheduler:

Calculates timesteps and utilizes prev_timestep in a slightly different way. Gives t-1 timestep to the unet, gives t to the scheduler, and doesn't use the last timestep.
nesting_level=1-type uses 2 noise matrices: 3×64×64 and 3×256×256. And, nesting_level=2-type uses 3 noise matrices: 3×64×64, 3×256×256, 3×1024×1024. Each noise matrix has its own calculations in the scheduler. One produces 3 images from a nesting_level=2 model with 3 different resolutions.
There might be some optimizations possible; e.g., the scheduler makes its calculations sequentially for each noise matrix, and since each element has a different shape one cannot utilize broadcasting I guess. IMHO, one could make them equal in shape, make calculations with broadcasting, and utilize masking at the end at the expense of more memory usage.

scheduler = MatryoshkaDDIMScheduler(prediction_type="v_prediction",
		beta_schedule="squaredcos_cap_v2", timestep_spacing="matryoshka_style",)

✅ convert_matryoshka_model_to_diffusers.py
✅ Show example results:

prompt0 = "a blue jay stops on the top of a helmet of Japanese samurai, background with sakura tree"
prompt = f"breathtaking {prompt0}. award-winning, professional, highly detailed"
image = pipe(prompt=prompt, num_inference_steps=50+).images
make_image_grid(image, rows=1, cols=len(image))

✅ 64×64, nesting_level=0: 1.719 GiB. With 50 DDIM inference steps:

64x64

✅ 256×256, nesting_level=1: 1.776 GiB. With 150 DDIM inference steps:

64x64	256x256

✅ 1024×1024, nesting_level=2: 1.792 GiB. As one can realize the cost of adding another layer is really negligible in this context! With 250 DDIM inference steps:

64x64	256x256	1024x1024

✅ Finish HF integration & upload converted checkpoints to HF.
✅ README.md
⏳ Make it as simple as possible, but not simpler. Note: I could make small additions/modifications in the future, e.g., for comments, etc...
❓ examples/**/train_matryoshka.py

I would like to congratulate you for this great work and thank you for open-sourcing the codebase with MIT license @MultiPath, @Shuangfei, @dreasysnail, Josh Susskind, @ndjaitly, @luke-carlson!

I believe/anticipate that this kind of representation learning will become popular, that acceleration improvements from contemporary diffusion modeling will be adapted to this model, and that training will be democratized without the need for large resources in the future.

@sayakpaul @pcuenca @a-r-r-o-w

sayakpaul · 2024-08-13T02:20:14Z

@tolgacangoz would you have cycles to work on this soon? Another contributor has expressed interest in working on it. Maybe you two could collaborate?

tolgacangoz · 2024-08-13T08:05:41Z

I am into the inference code atm. Will the training code in examples/**/train_matryoshka.py be implemented as well (since this is a very efficient model in training)? If so, he can take this up.

sayakpaul · 2024-08-13T08:08:27Z

For now, we don't have to focus on training.

…t_down_block` for FF layers in attention

…tEmbedding`

…r for Matryoshka models

…essor2_0`

…goz/diffusers into Add-Matryoshka-Diffusion-Models

luke-carlson · 2024-10-13T19:39:36Z

Thank you for working on this @tolgacangoz!

yiyixuxu

thanks!

tolgacangoz · 2024-10-15T06:58:55Z

Thanks for merging!

luke-carlson · 2024-10-15T18:40:26Z

Hey @tolgacangoz, are there any changes we need to make here to incorporate Jiatao's latest changes apple/ml-mdm#21

tolgacangoz · 2024-10-15T18:50:43Z

Probably. I will look into it tomorrow.

Edit: The usage of schedule_shifted_power seems to be changed. I will make the necessary changes.

a

f666908

tolgacangoz changed the title ~~Add Matryoshka Diffusion Models~~ Add 🪆Matryoshka Diffusion Models Aug 12, 2024

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

c092728

tolgacangoz and others added 14 commits August 13, 2024 18:02

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

cfe8dcc

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

c1b6c0f

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

60021b8

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

0ad7101

refactor: add ff_act_fn parameter to UNet2DConditionModel and `ge…

aabac0a

…t_down_block` for FF layers in attention

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

ad4c6a3

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

649baa6

Study as an ordinary UNet model

279d613

make style

5f5bd08

make fix-copies

bfd8b9d

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

c3b004b

Up

eaef037

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

2e99ec7

Up

99d9099

tolgacangoz changed the title ~~Add 🪆Matryoshka Diffusion Models~~ [Community Pipeline] Add 🪆Matryoshka Diffusion Models Sep 7, 2024

tolgacangoz added 9 commits September 7, 2024 15:33

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

33dd50d

Fix timestep embedding conditioning in `MatryoshkaCombinedTimestepTex…

56e61f0

…tEmbedding`

make style

376500a

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

4c16e5b

Revert; cuz I should have created (probably) a new attention processo…

8c4dcb3

…r for Matryoshka models

Revert to create your own custom transformer block

ef38541

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

8eadb30

Init template for the pipeline

19d6c17

Add MatryoshkaTransformerBlock and MatryoshkaFeedForward classes

7d1a0ab

tolgacangoz added 9 commits October 10, 2024 21:36

style

360f57e

Refactor optional components in MatryoshkaPipeline

83262f8

Simplify

e543379

Add 🪆Matryoshka Diffusion Models to community pipelines in Readme.md

bd91585

Update example usage

3430345

Refactor MatryoshkaTransformerBlock to use `MatryoshkaFusedAttnProc…

149e8b5

…essor2_0`

style

ecca7e3

simplify

6fd62e0

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

c90993d

tolgacangoz marked this pull request as ready for review October 13, 2024 10:43

tolgacangoz marked this pull request as draft October 13, 2024 16:26

Add trust_remote_code=True requirement for custom components

5009be1

tolgacangoz marked this pull request as ready for review October 13, 2024 17:03

tolgacangoz and others added 3 commits October 13, 2024 20:55

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

29eceb7

revert

4c3ba48

Merge branch 'Add-Matryoshka-Diffusion-Models' of github.com:tolgacan…

8f0e888

…goz/diffusers into Add-Matryoshka-Diffusion-Models

tolgacangoz marked this pull request as draft October 13, 2024 19:02

tolgacangoz marked this pull request as ready for review October 14, 2024 08:37

tolgacangoz added 3 commits October 14, 2024 12:04

Update README.md

1b756d1

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

787b1a5

Merge branch 'main' into Add-Matryoshka-Diffusion-Models

d197cc1

yiyixuxu approved these changes Oct 14, 2024

View reviewed changes

yiyixuxu merged commit 56c2115 into huggingface:main Oct 14, 2024
8 checks passed

tolgacangoz deleted the Add-Matryoshka-Diffusion-Models branch October 15, 2024 06:59

tolgacangoz restored the Add-Matryoshka-Diffusion-Models branch October 15, 2024 07:53

tolgacangoz deleted the Add-Matryoshka-Diffusion-Models branch October 15, 2024 07:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

tolgacangoz commented Aug 12, 2024 •

edited

Loading

sayakpaul commented Aug 13, 2024

tolgacangoz commented Aug 13, 2024

sayakpaul commented Aug 13, 2024

luke-carlson commented Oct 13, 2024

yiyixuxu left a comment

tolgacangoz commented Oct 15, 2024

luke-carlson commented Oct 15, 2024

tolgacangoz commented Oct 15, 2024 •

edited

Loading

[Community Pipeline] Add 🪆Matryoshka Diffusion Models #9157

[Community Pipeline] Add 🪆Matryoshka Diffusion Models #9157

Conversation

tolgacangoz commented Aug 12, 2024 • edited Loading

sayakpaul commented Aug 13, 2024

tolgacangoz commented Aug 13, 2024

sayakpaul commented Aug 13, 2024

luke-carlson commented Oct 13, 2024

yiyixuxu left a comment

Choose a reason for hiding this comment

tolgacangoz commented Oct 15, 2024

luke-carlson commented Oct 15, 2024

tolgacangoz commented Oct 15, 2024 • edited Loading

[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

[`Community Pipeline`] Add 🪆Matryoshka Diffusion Models #9157

tolgacangoz commented Aug 12, 2024 •

edited

Loading

tolgacangoz commented Oct 15, 2024 •

edited

Loading