[Issue]: Odd OOMs with SD1.5 checkpoints when run from a chroot. #3444
Replies: 4 comments 5 replies
-
running in chroot is really not something that out-of-the-box software like this is tested with nor its supported, this would be better suited as a discussion thread than an issue - transftering there.
to be honest, i'd much rather spend time on trying to fix those incompatibilities than to run in chroot. |
Beta Was this translation helpful? Give feedback.
-
The first problem is that Trixie comes with Python 3.12 only, so some packages that are used by SDNEXT complain. I guess this could be worked around by hand-rolling an older version and using the The bigger problem is that I am using the machine for Steam gaming, and having both Steam's Proton/Wine environment and ROCm on the machine makes for various package conflicts. To be clear: I am not planning on using Steam Games and SDNEXT at the same time, but the presence of the libraries needed for gaming and the ROCm libraries at the same time caused all manners of conflict last time I tried (about 6mo ago). I am willing to give that another go this weekend, and document exactly what/how things break. Fortunately, I can just snapshot my current working setup and rollback if needed. I have also tried using a VM, with the XTX PCI device handed through transparently (virt-manager+qemu+kvm). This is very awkward since the host machine then basically loses its only output device and I have to use a serial console. Furthermore, this also caused weird GPU crashes last time I tried it (not 100% sure, but the same setup has never crashed in that manner otherwise). Finally, I guess I could dual boot with two Linux installs (one testing, one stable). Last time I tried it, however, Debian thought there was only one install and every time GRUB got updated, the other install go "shadowed" in that it disappeared from the GRUB menu. There is probably a workaround for that, too, but I have not dug into it, so I am only mentioning it for completeness sake. |
Beta Was this translation helpful? Give feedback.
-
Weekend came early. My notes became long-ish, but I think the details matter, so I've also provided a... TL;DR
I have included my steps below, for context. Base state
ROCmGoing by these instructions: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/native-install/ubuntu.html With the following changes:
This is the first point that changed from ~6 mo ago: At this point, I'd usually run into an unsolvable dep issue, or Steam would completely break (either no game would start, or not even the Steam UI/client). Both work now, so we'll call that progress! SDNEXT/Python
At any rate, I think I have narrowed it down to the interesting bit, and I definitely can get rid of the chroot setup, which is a very welcome simplification. [0] Including deps, this will install:
|
Beta Was this translation helpful? Give feedback.
-
nice progress and write-up.
well, hires tries 2nd pass with now upscaled image at 1024x1440 which is a lot. anyhow, why oom? i can only guess - in general, memory space needed for any particular resolution is similar in sd15 and sdxl. yes, size of sdxl itself is larger, but i'm talking about workspace not model size - and sdxl performs far more granular allocation steps while sd15 asks gpu for massive chuck - i'm guessing thats what causes the issues in your env. |
Beta Was this translation helpful? Give feedback.
-
Issue Description
Due to software incompatibilities, I have to run automatic/sdnext in a Debian stable chroot (the host system has to run Debian testing).
SDXL and Pony models work fine, but using SD1.5, I get an OOM during the refine (upscale) step (logs below)
Version Platform Description
Debian stable chroot (Bookworm), with a Debian testing kernel (6.10.9-amd64). ROCm inside the chroot is v6.1.
Version info:
Mounted filesystems:
It being a memory problem at first made me think that maybe I had forgotten to mount some shm filesystem, but as can be seen above, it's there. I wasn't sure about HugeTLB being needed, but either way, it's there as well. As far as I can tell, the generation settings (SD1.5, 512x720, "potato", Upscale 2x using R-ESRGAN) should work just fine --- and it does with SDLX. It also works with SD2.1, but all SD1.5 based checkpoints (and SD1.5 itself) I tried failed in this way. I also tried smaller image and batch sizes, but while the allegedly needed amounts of memory were smaller, my 24GB XTX still wasn't enough, mysteriously.
I get that this likely is not a bug in SDNEXT/Automatic, but I am out of ideas as to what might be missing. TIA for any insights you might have.
Relevant log output
Backend
Diffusers
UI
Standard
Branch
Master
Model
StableDiffusion 1.5
Acknowledgements
Beta Was this translation helpful? Give feedback.
All reactions