-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to run on a single linux server with multiple GPUs #20
Comments
@1999kevin Can you tell me how to use a gpu to generate images using a pretrained model without the communication protocol nccl. Thank you |
Just delete the mpiexec part in the command of the sampling. |
@1999kevin but in image_samping.py ,I don't find mpiexec.thanks |
Can I have a look at the code after your changes, thanks, I would appreciate it if you could send it over |
I'm still working on training phase and not so sure about the inference phase. I guess you can follow Line 48 and Line 51 in scripts/launch.sh to sample the images. If you want to use one thread, just directly use the command: |
I add
Does it work? |
I will test it once I finish current training. |
Btw, I found training with only 4 batchsize and 64 image size costs about 18G memory per GPU. Is there something wrong with it? |
I also encounter simialr problems in my test. I train the model with batchsize 2 and 256 image size, costing me 35G memory. |
|
Do not test such case currently. |
This change can definitely enable multiple GPUs training. However, it may cause error 'Expected q.stride(-1) == 1 to be true, but got false' as in issus #3. Change the flash attenion to defaclt can resolve the error |
Nice Job! I wonder how I can run the code on a single linux server with multiple GPUs. I can run the code on the server with one GPU by not using mpiexec. But what if I want to use multiple GPUs as nn.DataParallel?
The text was updated successfully, but these errors were encountered: