Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality-of-Life for Google Colab #3

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

woctezuma
Copy link

Hello,

I know you don't accept pull requests. However, this could be of interest to others who want to run the code on Google Colab,

To put it in a nutshell, I have ported my changes from NVlabs/stylegan2-ada#6 and a bit more:

  • save output images as JPG,
  • automatically resume from the latest .pkl file with the command-line argument --resume=latest,
  • automatically set the resume value of kimg,
  • automatically set the resume value of the augmentation strength,
  • allow to manually set the resume value of the augmentation strength,
  • add config auto_norp to replicate the auto config without EMA rampup,
  • allow to override mapping net depth with the command-line argument --cfg_map,
  • allow to enforce CIFAR-specific architecture tuning with the command-line argument --cifar_tune.

I have tested the training with my changes in one of my repository, in order to fix bugs which I could have introduced. It seems fine.

@ink1 ink1 mentioned this pull request Apr 10, 2021
snakch pushed a commit to snakch/stylegan2-ada-pytorch that referenced this pull request Jun 20, 2021
@estan
Copy link

estan commented Jul 24, 2021

Thanks for this @woctezuma! This will probably come in handy for me (even if I'm not running on colab).

Some questions:

  • In your example training in https://github.com/woctezuma/steam-stylegan2-ada-pytorch/blob/main/training.ipynb, you don't use any EMA ramp-up even for the initial training. Do you do this because in that example, you are doing transfer learning from ffhq256?
  • In my case, I would like to do from scratch training on my own dataset, so I would use --cfg auto for the initial training, but then --cfg auto_norp when I resume?
  • Or is the auto_norp config you added not related to resume at all? (and in the example you simply did not want to use EMA ramp-up at all? Sorry, I don't quite know what EMA ramp-up does, and want to use the right params for my initial training and for my resumptions.

@woctezuma
Copy link
Author

woctezuma commented Jul 24, 2021

I added auto_norp to let people disable the Exponential Moving Average (EMA) ramp-up.
The ramp-up controls the update of G_ema as shown below:

ema_kimg = 10, # Half-life of the exponential moving average (EMA) of generator weights.
ema_rampup = None, # EMA ramp-up coefficient.

# Update G_ema.
with torch.autograd.profiler.record_function('Gema'):
ema_nimg = ema_kimg * 1000
if ema_rampup is not None:
ema_nimg = min(ema_nimg, cur_nimg * ema_rampup)
ema_beta = 0.5 ** (batch_size / max(ema_nimg, 1e-8))
for p_ema, p in zip(G_ema.parameters(), G.parameters()):
p_ema.copy_(p.lerp(p_ema, ema_beta))

There are 2 observations to be made in the code released by Nvidia:

  1. EMA ramp-up is disabled when "resuming", which actually means transfer learning in the original code:

if resume != 'noresume':
args.ada_kimg = 100 # make ADA react faster at the beginning
args.ema_rampup = None # disable EMA rampup

  1. EMA ramp-up is disabled for every config except auto and cifar:

cfg_specs = {
'auto': dict(ref_gpus=-1, kimg=25000, mb=-1, mbstd=-1, fmaps=-1, lrate=-1, gamma=-1, ema=-1, ramp=0.05, map=2), # Populated dynamically based on resolution and GPU count.
'stylegan2': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=10, ema=10, ramp=None, map=8), # Uses mixed-precision, unlike the original StyleGAN2.
'paper256': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=0.5, lrate=0.0025, gamma=1, ema=20, ramp=None, map=8),
'paper512': dict(ref_gpus=8, kimg=25000, mb=64, mbstd=8, fmaps=1, lrate=0.0025, gamma=0.5, ema=20, ramp=None, map=8),
'paper1024': dict(ref_gpus=8, kimg=25000, mb=32, mbstd=4, fmaps=1, lrate=0.002, gamma=2, ema=10, ramp=None, map=8),
'cifar': dict(ref_gpus=2, kimg=100000, mb=64, mbstd=32, fmaps=1, lrate=0.0025, gamma=0.01, ema=500, ramp=0.05, map=2),
}

args.ema_kimg = spec.ema
args.ema_rampup = spec.ramp


Do you do this because in that example, you are doing transfer learning from ffhq256?

I don't think it matters here, due to point n°1 above.


In my case, I would like to do from scratch training on my own dataset, so I would use --cfg auto for the initial training, but then --cfg auto_norp when I resume?

This is your decision here. You could use either auto or auto_norp for the initial training. cf. point n°2 above


Or is the auto_norp config you added not related to resume at all? (and in the example you simply did not want to use EMA ramp-up at all?

The auto_norp config is intended to be a minor variation of the auto config. This can be useful if you train your model from scratch. For instance, except for CIFAR, Nvidia disabled EMA ramp-up in their experiments. cf. point n°2 above


On a side-note, beware that parameters which are automatically set are not that great. For instance:

  • you would not find the optimal parameters used in the paper's experiments if you used auto,
  • I would suggest to try to manually adjust gamma, e.g. try to increase it if you have a small dataset.

cf. the README which I quote below:

--gamma=10 overrides R1 gamma.
We recommend trying a couple of different values [for R1 gamma] for each new dataset.

You will need some computational resources to explore the parameter space.

@estan
Copy link

estan commented Jul 24, 2021

Thanks a lot for the explanation and tips @woctezuma.

@thusinh1969
Copy link

It costed me 10,000 USD already, still NOT CONVERGENCE !!!

@woctezuma
Copy link
Author

Don't spend so much money! You are not guaranteed that it would work! It depends on data, parameters, etc.

@ekerstein
Copy link

ekerstein commented Sep 20, 2021

This branch is very helpful, thank you 🙏

@woctezuma How would you tune this for transfer learning? I want to "resume" from an existing network file but use a new dataset. Do I then need to adjust the augmentation strength or anything else? Any recommendations?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants