Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to implement Accelerate from Hugging Face? #700

Closed
Omegadarling opened this issue Jan 5, 2023 · 12 comments
Closed

Is it possible to implement Accelerate from Hugging Face? #700

Omegadarling opened this issue Jan 5, 2023 · 12 comments

Comments

@Omegadarling
Copy link

WHAT
'Accelerate' is for training with multiple GPUs.

WHERE
https://huggingface.co/docs/transformers/accelerate

HOW
I'm just an artist who has a healthy appreciation for coding, but yeah, check the URL.

WHEN
Well, I have a 10 x 3090 GPU machine and it might be nice to train models faster.

@78Alpha
Copy link

78Alpha commented Jan 5, 2023

Have you tried setting ACCELERATE=true for the webui?

@d8ahazard
Copy link
Owner

This was implemented quite a while ago...

@Omegadarling
Copy link
Author

@d8ahazard That's great and I kind of see it "working" after I added set ACCELERATE="True" to my webui-user.bat, but do you know of anywhere that someone explains how to do this with Automatic1111?

@78Alpha
Copy link

78Alpha commented Jan 12, 2023

Accelerate itself has very little online that can be searched up. Here is the pull where it was added, that is the well of info for the time being.

AUTOMATIC1111/stable-diffusion-webui#4527

@Omegadarling
Copy link
Author

I've gotten it further by running pip install requirements.txt on Automatic1111, but then when I ran with ACCELERATE="True" again I got some very worrying error messages. Here's my log from Windows PowerShell:
SD_accelerateTrue_log_2023-01-11-1823.txt

It does appear that Accelerate is starting to do some things to split out onto the 8 GPUs that are currently installed (pulled two for a workstation, but when I get Accelerate working I'll be putting those back in).

But there's some very suspicious attempts to connect to [www.007guard.com]:29500 and I looked up the commit hash that gets posted after that and it's also referencing this 007guard website that no longer exists. This only comes up when I have Accelerate enabled and I don't see that error message when I run normally on one GPU.

I use this 10 GPU beast of a machine to do my day job (lot of 3D rendering) but it's so frustrating to have all that untapped power that could be used to make some really deep Stable Diffusion models and embeddings!

@78Alpha
Copy link

78Alpha commented Jan 13, 2023

For the 007guard, take a look at this for some info https://superuser.com/questions/706729/007guard-what-is-it-is-it-dangerous-and-can-it-be-removed

@Omegadarling
Copy link
Author

Omegadarling commented Jan 13, 2023

For the 007guard, take a look at this for some info https://superuser.com/questions/706729/007guard-what-is-it-is-it-dangerous-and-can-it-be-removed

Wow. I do have Spybot running, so that's probably spot on! I'm uncommenting the localhost line, but there is a line above that says # localhost name resolution is handled within DNS itself. so I'm hoping this doesn't create a conflict somewhere...

Also, THANK YOU!

@78Alpha
Copy link

78Alpha commented Jan 13, 2023

And adding in for it not finding GPUs, I noticed 'Torch is not able to use GPU; add --skip-torch-cuda-test in the log. From my experience, it's an issue with accelerate. Running it from a script or anaconda/env gives it trouble. Running the script directly will work or will at least provide an error to work with. Accelerate tends to spit out a message equal to An error has occurred because an error has occurred

You can try running accelerate on something from a test venv or your main interpreter and see if it can find the GPUs.

@Omegadarling
Copy link
Author

Omegadarling commented Jan 13, 2023

@78Alpha Could it be something as simple as how old my Nvidia driver is? I'm using 472.47, which came out on 2021.11.10. I had to use an older driver to use RNDR, but RNDR now works with newer drivers. The only reason I haven't updated is that it takes over an hour for the drivers to install. Something about PCIe enumeration just gets exponentially slower with each additional GPU.

@78Alpha
Copy link

78Alpha commented Jan 13, 2023

I couldn't guarantee. I usually drop ACCELERATE all together when it starts complaining about not being able to find a GPU. Some anaconda environments or envs work, others won't. Seemed hit or miss. I myself used new and old drivers. An RTX card and even a Tesla P40.

All from a windows environment of course. 1 colab as well.

@Omegadarling
Copy link
Author

I couldn't guarantee. I usually drop ACCELERATE all together when it starts complaining about not being able to find a GPU.

Are you using a different library for multi-GPU or just living with ONLY one GPU?

@78Alpha
Copy link

78Alpha commented Jan 14, 2023

I couldn't guarantee. I usually drop ACCELERATE all together when it starts complaining about not being able to find a GPU.

Are you using a different library for multi-GPU or just living with ONLY one GPU?

I live with using just the 1 GPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants