-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Community Pipeline
] Add 🪆Matryoshka Diffusion Models
#9157
[Community Pipeline
] Add 🪆Matryoshka Diffusion Models
#9157
Conversation
@tolgacangoz would you have cycles to work on this soon? Another contributor has expressed interest in working on it. Maybe you two could collaborate? |
I am into the inference code atm. Will the training code in |
For now, we don't have to focus on training. |
…t_down_block` for FF layers in attention
Community Pipeline
] Add 🪆Matryoshka Diffusion Models
…r for Matryoshka models
…goz/diffusers into Add-Matryoshka-Diffusion-Models
Thank you for working on this @tolgacangoz! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
Thanks for merging! |
Hey @tolgacangoz, are there any changes we need to make here to incorporate Jiatao's latest changes apple/ml-mdm#21 |
Probably. I will look into it tomorrow. Edit: The usage of |
Thanks for the opportunity to work on this model!
The Abstract of the paper:
Paper: 🪆Matryoshka Diffusion Models
Repository: https://github.com/apple/ml-mdm
Hugging Face Space: https://huggingface.co/spaces/pcuenq/mdm
License: MIT license
Key takeaways from the paper:
None
; since Matryoshka Diffusion Models work on the (extended) pixel space(s).flan-t5-xl
TODOs:
✅ The U-Net; in other words, the inner-most structure,
nesting_level=0
; approximately would be as follows:✅ Scheduler:
timesteps
and utilizesprev_timestep
in a slightly different way. Givest-1
timestep to theunet
, givest
to thescheduler
, and doesn't use the lasttimestep
.nesting_level=1
-type uses2
noise matrices:3×64×64
and3×256×256
. And,nesting_level=2
-type uses3
noise matrices:3×64×64
,3×256×256
,3×1024×1024
. Each noise matrix has its own calculations in the scheduler. One produces3
images from anesting_level=2
model with3
different resolutions.✅
convert_matryoshka_model_to_diffusers.py
✅ Show example results:
64×64, nesting_level=0
: 1.719 GiB. With50
DDIM inference steps:256×256, nesting_level=1
: 1.776 GiB. With150
DDIM inference steps:1024×1024, nesting_level=2
: 1.792 GiB. As one can realize the cost of adding another layer is really negligible in this context! With250
DDIM inference steps:✅ Finish HF integration & upload converted checkpoints to HF.
✅
README.md
⏳ Make it as simple as possible, but not simpler. Note: I could make small additions/modifications in the future, e.g., for comments, etc...
❓
examples/**/train_matryoshka.py
I would like to congratulate you for this great work and thank you for open-sourcing the codebase with MIT license @MultiPath, @Shuangfei, @dreasysnail, Josh Susskind, @ndjaitly, @luke-carlson!
I believe/anticipate that this kind of representation learning will become popular, that acceleration improvements from contemporary diffusion modeling will be adapted to this model, and that training will be democratized without the need for large resources in the future.
@sayakpaul @pcuenca @a-r-r-o-w