Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conversion can fail at ControlNet on machines with 16GB of system memory #369

Closed
ssube opened this issue May 1, 2023 · 3 comments
Closed
Labels
Milestone

Comments

@ssube
Copy link
Owner

ssube commented May 1, 2023

Converting models on a system with 16GB of system memory, without the onnx-fp16 or torch-fp16 optimizations, can lead to an out of memory error:

image

This appears to be related to the ControlNet conversion, which makes some sense, because that is effectively a copy of the UNet and the UNet is already the largest model. However, @HoopyFreud reports this was not happening on the previous conversion method, so it may also be related to #337

I have not documented the minimum specs yet, but would like to support 4c/16GB machines with 4GB of VRAM.

@ssube ssube added status/planned issues that have been planned but not started scope/convert model/diffusion pipeline/controlnet labels May 1, 2023
@ssube ssube added this to the v0.10 milestone May 1, 2023
@ssube
Copy link
Owner Author

ssube commented May 15, 2023

I've confirmed this on a 16GB laptop, where converting almost always fails with an Aborted or out of memory error.

Looking at the memory profiler, the usage during conversion can easily hit 15GB:

image

That doesn't leave much memory for the system and will typically fail on a machine with 16GB. According to the profiler, some 6GB of memory are being used by Torch and/or CUDA.

When converting diffusion models, the full pipeline is loaded, each model is exported to ONNX, then the ONNX models are reloaded in order to be optimized (single external tensor file, fp16, etc). Recent changes for ControlNet added some additional load_model calls before the UNet has been fully unloaded, which can load a second copy of the CNet/UNet and take 3-4GB of memory.

@ssube
Copy link
Owner Author

ssube commented May 16, 2023

This is working better and I was able to convert the base models on a 16GB laptop, but it can still freeze during conversion, and converting multiple models in a row seems to make that more likely. Based on the current info, it seems to be getting stuck during the cnet conversion: there is a valid unet model and the cnet folder exists, but is empty.

@ssube ssube added status/progress issues that are in progress and have a branch and removed status/planned issues that have been planned but not started labels Jun 9, 2023
@ssube
Copy link
Owner Author

ssube commented Dec 25, 2023

The new optimum-based converter should fully unload the unet before converting the cnet, with no option to share them. That should reduce memory use to the minimum possible through that code path.

@ssube ssube closed this as completed Dec 25, 2023
@ssube ssube added status/fixed issues that have been fixed and released and removed status/progress issues that are in progress and have a branch labels Dec 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant