-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ai Worker Bridge Crashing After Some Time - Likely Due to GPU Usage #297
Comments
It is likely that your configuration needs to be adjusted. Features such as controlnets and Lora's will very likely not work with your card. Further, max threads should be 1, max power shouldn't be much higher than 16, and the vram to keep free option should be left at 80%. Support can better be provided in the local workers channel on the official discord (https://discord.gg/hzgR8cc67P). If you are unable to use discord, I advise you to check the logs/trace.log file for errors and/or logs/bridge.log for other information. |
The max power is at the default, 8, max thread's is 1, vram to keep free option is 80% but that option seems to be irrelevant to how the program functions. I don't have LoRA's on, but when I checked the logs it was all python trying to allocate vram when there was none availiable. Which led to a crash. Here is the most common error message:
I run Stable Diffusion just fine with the --Medvram flag. Is that not something that can be implemented? |
No, that will make it too slow to be used on the AI Horde. Try to disable post-processing. If it's still crashing, your card just doesn't have enough VRAM to run SD fast enough for the horde |
Too slow to be used on the horde? Medvram drastically reduces the requirements to run SD stably, at a very minor performance cost. I did a benchmark of generating 6 images with the same seed, step count, and model, the results are below: 6-BatchCount 1-BatchSize Medvram: 74.40s Its only about 14% slower when run on single batches. Computers with a 'slow' image speed are already listed as 'slow' workers, but queue times for the image generation are very long, much longer than the time it takes to generate the images. This indicates that the horde does not have enough processing resources available to meet requests - having a way to lower the barrier of entry at a minor individual cost ONLY when necessary will result in a net overall gain. In addition, medvram reduces the operation requirements so much that you can use other optimizations, like increasing the batch size or resolution. Below is the same bench mark, just with a batchsize of 6. 1-BatchCount 6-BatchSize Medvram: 58.15s I've always dreamed of making something like this, but I don't and didn't have the technical expertise to realize it. I hope you can at least consider this idea, as I believe it to be a good one. |
It sounds like you probably need to adjust some of the configuration options already present in the horde worker. The settings you're talking about are otherwise specific to a program that is not used in the horde worker, and in fact there is something similar (but not identical) already present. If you come to the discord, more real-time troubleshooting can be provided. |
There is the slow_workers, but there's also the generic stale timer, which is around 120 sec for one image at 512x512. If your worker can stay below this threshold, you should be fine. but otherwise, what tazlin said, you can join us on discord for easier troubleshooting |
Ok I'll do that. |
When running 'horde-bridge.cmd' on my computer,
Using device: CUDA 0: NVIDIA GeForce GTX 1660 SUPER
, the process loads up properly and runs normally except for these two occasional errors:Model name requested SDXL_beta::stability.ai#6901 in bridgeData is unknown to us. Please check your configuration. Aborting!
andThis job took longer than average to process. Please consider lowering your max_power.
It continually runs at minimal CPU Load, at 3%, while the GPU is almost always at a 99% load. My Gpu is the one stated above, a GTX 1660 Super. It's not the best performance wise but it is way better than what most computers run and meets the minimal vRam requirements listed. However, when I use Stable Diffusion (and now your program) on the basic settings, it will occasionally crash.The workaround I've found for Automatic1111's web ui for stable diffusion is launching with the tag --Medvram. This tag splits the process of generating the image into 3 areas, your computer ram, the processor, and the gpu. It allows me and many others to generate AI images consistently (and quickly) without having to worry about random crashes due to some Gpu issue. I was not able to find this feature however with this project. Unfortunately this also means I am unable to leave my pc with it being a worker on for any significant period of time - and likely acts as a barrier for many others as well.
The random crashing could be due to any issue, but I have a strong hunch it's to do with my gpu running at 100% capacity all the time when your program is active. Implementing something like medvram would drastically lower the barrier of entry to the 'horde' and as such multiply its strength. I appreciate your approach of crowdsourcing AI processing to make it accessible to everyone, and I hope to one day be a part of bringing this to life
The text was updated successfully, but these errors were encountered: