-
Notifications
You must be signed in to change notification settings - Fork 479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Welcome to unofficial TPU enabled PyTorch implementation #61
Comments
did you have some issues in mind? |
Actually no. |
hi @shizhediao Have you executed any of the codes in TPU version so far? If so, could you please give some logs on your experiments? |
It's worth noting there's another TPU implementation where they claim to have trained models successfully: https://github.com/giannisdaras/smyrf/tree/master/examples/tpu_biggan (supporting code for "SMYRF: Efficient attention using asymmetric clustering", Daras et al 2020). Tensorfork has been considering training it, despite PyTorch requiring paying for way more VMs than a Tensorflow implementation would, to establish as baseline given our difficulties getting the compare_gan BigGAN to reach high quality. |
@gwern Thanks for sharing. However I found that simply wrap the original Pytorch BigGAN into a TPU-enable version seems to be super slow since there're some ops that requires context switching between CPU and TPU(e.g. interpolate2d in |
Hi everyone,
I implemented three TPU enabled PyTorch training repos for BigGAN-PyTorch, all of which are based on this repo.
BigGAN-PyTorch-TPU-Single: Training BigGAN with a single TPU.
BigGAN-PyTorch-TPU-Parallel: Parallel version (multiple-thread) for training BigGAN with TPU.
BigGAN-PyTorch-TPU-Distribute: Distributed version (multiple-process) for training BigGAN with TPU.
I have checked the training process which seems to be normal.
There may be some potential issues (sorry that I'm a novice for TPU training).
Pull requests to fix some of the issues would be appreciated and welcome to discuss.
The text was updated successfully, but these errors were encountered: