Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run on a single linux server with multiple GPUs #20

Open
1999kevin opened this issue Apr 20, 2023 · 12 comments
Open

How to run on a single linux server with multiple GPUs #20

1999kevin opened this issue Apr 20, 2023 · 12 comments

Comments

@1999kevin
Copy link

Nice Job! I wonder how I can run the code on a single linux server with multiple GPUs. I can run the code on the server with one GPU by not using mpiexec. But what if I want to use multiple GPUs as nn.DataParallel?

@stonecropa
Copy link

@1999kevin Can you tell me how to use a gpu to generate images using a pretrained model without the communication protocol nccl. Thank you

@1999kevin
Copy link
Author

@1999kevin Can you tell me how to use a gpu to generate images using a pretrained model without the communication protocol nccl. Thank you

Just delete the mpiexec part in the command of the sampling.

@stonecropa
Copy link

@1999kevin but in image_samping.py ,I don't find mpiexec.thanks

@stonecropa
Copy link

Can I have a look at the code after your changes, thanks, I would appreciate it if you could send it over

@1999kevin
Copy link
Author

Can I have a look at the code after your changes, thanks, I would appreciate it if you could send it over

I'm still working on training phase and not so sure about the inference phase. I guess you can follow Line 48 and Line 51 in scripts/launch.sh to sample the images. If you want to use one thread, just directly use the command: python image_sample.py ...

@tyshiwo1
Copy link

tyshiwo1 commented Apr 22, 2023

I add CUDA_VISIBLE_DEVICES=6,7 in front of the inference command to form CUDA_VISIBLE_DEVICES=6,7 mpiexec -n 2 python ./scripts/image_sample.py ..., and change the code of ./cm/dist_util.py#L27 into:

    if 'CUDA_VISIBLE_DEVICES' not in os.environ:
        os.environ["CUDA_VISIBLE_DEVICES"] = f"{MPI.COMM_WORLD.Get_rank() % GPUS_PER_NODE}"
    else:
        gpu_inds_list = os.environ["CUDA_VISIBLE_DEVICES"].split(',')
        idx = MPI.COMM_WORLD.Get_rank() % GPUS_PER_NODE
        os.environ["CUDA_VISIBLE_DEVICES"] = gpu_inds_list[idx]

Does it work?

@1999kevin
Copy link
Author

Does it work?

I will test it once I finish current training.

@tyshiwo1
Copy link

tyshiwo1 commented Apr 23, 2023

Does it work?

I will test it once I finish current training.

Btw, I found training with only 4 batchsize and 64 image size costs about 18G memory per GPU. Is there something wrong with it?

@1999kevin
Copy link
Author

Btw, I found training with only 4 batchsize and 64 image size costs about 18G memory per GPU. Is there something wrong with it?

I also encounter simialr problems in my test. I train the model with batchsize 2 and 256 image size, costing me 35G memory.

@stonecropa
Copy link

Btw, I found training with only 4 batchsize and 64 image size costs about 18G memory per GPU. Is there something wrong with it?

I also encounter simialr problems in my test. I train the model with batchsize 2 and 256 image size, costing me 35G memory.
Will the pre-training model also use such a large amount of Gpu memory?

@1999kevin
Copy link
Author

Will the pre-training model also use such a large amount of Gpu memory?

Do not test such case currently.

@1999kevin
Copy link
Author

I add CUDA_VISIBLE_DEVICES=6,7 in front of the inference command to form CUDA_VISIBLE_DEVICES=6,7 mpiexec -n 2 python ./scripts/image_sample.py ..., and change the code of ./cm/dist_util.py#L27

This change can definitely enable multiple GPUs training. However, it may cause error 'Expected q.stride(-1) == 1 to be true, but got false' as in issus #3. Change the flash attenion to defaclt can resolve the error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants