Quality-of-Life for Google Colab #3

woctezuma · 2021-02-01T23:29:38Z

Hello,

I know you don't accept pull requests. However, this could be of interest to others who want to run the code on Google Colab,

To put it in a nutshell, I have ported my changes from NVlabs/stylegan2-ada#6 and a bit more:

save output images as JPG,
automatically resume from the latest .pkl file with the command-line argument --resume=latest,
automatically set the resume value of kimg,
automatically set the resume value of the augmentation strength,
allow to manually set the resume value of the augmentation strength,
add config auto_norp to replicate the auto config without EMA rampup,
allow to override mapping net depth with the command-line argument --cfg_map,
allow to enforce CIFAR-specific architecture tuning with the command-line argument --cifar_tune.

I have tested the training with my changes in one of my repository, in order to fix bugs which I could have introduced. It seems fine.

Add weight layer to cff

estan · 2021-07-24T17:20:03Z

Thanks for this @woctezuma! This will probably come in handy for me (even if I'm not running on colab).

Some questions:

In your example training in https://github.com/woctezuma/steam-stylegan2-ada-pytorch/blob/main/training.ipynb, you don't use any EMA ramp-up even for the initial training. Do you do this because in that example, you are doing transfer learning from ffhq256?
In my case, I would like to do from scratch training on my own dataset, so I would use --cfg auto for the initial training, but then --cfg auto_norp when I resume?
Or is the auto_norp config you added not related to resume at all? (and in the example you simply did not want to use EMA ramp-up at all? Sorry, I don't quite know what EMA ramp-up does, and want to use the right params for my initial training and for my resumptions.

woctezuma · 2021-07-24T17:58:52Z

I added auto_norp to let people disable the Exponential Moving Average (EMA) ramp-up.
The ramp-up controls the update of G_ema as shown below:

stylegan2-ada-pytorch/training/training_loop.py

Lines 104 to 105 in d4b2afe

    
           ema_kimg                = 10,       # Half-life of the exponential moving average (EMA) of generator weights. 
        
           ema_rampup              = None,     # EMA ramp-up coefficient.

stylegan2-ada-pytorch/training/training_loop.py

Lines 296 to 303 in d4b2afe

    
           # Update G_ema. 
        
           with torch.autograd.profiler.record_function('Gema'): 
        
               ema_nimg = ema_kimg * 1000 
        
               if ema_rampup is not None: 
        
                   ema_nimg = min(ema_nimg, cur_nimg * ema_rampup) 
        
               ema_beta = 0.5 ** (batch_size / max(ema_nimg, 1e-8)) 
        
               for p_ema, p in zip(G_ema.parameters(), G.parameters()): 
        
                   p_ema.copy_(p.lerp(p_ema, ema_beta))

There are 2 observations to be made in the code released by Nvidia:

EMA ramp-up is disabled when "resuming", which actually means transfer learning in the original code:

stylegan2-ada-pytorch/train.py

Lines 313 to 315 in d4b2afe

    
           if resume != 'noresume': 
        
               args.ada_kimg = 100 # make ADA react faster at the beginning 
        
               args.ema_rampup = None # disable EMA rampup

EMA ramp-up is disabled for every config except auto and cifar:

stylegan2-ada-pytorch/train.py

Lines 154 to 161 in d4b2afe

    
           cfg_specs = { 
        
               'auto':      dict(ref_gpus=-1, kimg=25000,  mb=-1, mbstd=-1, fmaps=-1,  lrate=-1,     gamma=-1,   ema=-1,  ramp=0.05, map=2), # Populated dynamically based on resolution and GPU count. 
        
               'stylegan2': dict(ref_gpus=8,  kimg=25000,  mb=32, mbstd=4,  fmaps=1,   lrate=0.002,  gamma=10,   ema=10,  ramp=None, map=8), # Uses mixed-precision, unlike the original StyleGAN2. 
        
               'paper256':  dict(ref_gpus=8,  kimg=25000,  mb=64, mbstd=8,  fmaps=0.5, lrate=0.0025, gamma=1,    ema=20,  ramp=None, map=8), 
        
               'paper512':  dict(ref_gpus=8,  kimg=25000,  mb=64, mbstd=8,  fmaps=1,   lrate=0.0025, gamma=0.5,  ema=20,  ramp=None, map=8), 
        
               'paper1024': dict(ref_gpus=8,  kimg=25000,  mb=32, mbstd=4,  fmaps=1,   lrate=0.002,  gamma=2,    ema=10,  ramp=None, map=8), 
        
               'cifar':     dict(ref_gpus=2,  kimg=100000, mb=64, mbstd=32, fmaps=1,   lrate=0.0025, gamma=0.01, ema=500, ramp=0.05, map=2), 
        
           }

stylegan2-ada-pytorch/train.py

Lines 192 to 193 in d4b2afe

    
           args.ema_kimg = spec.ema 
        
           args.ema_rampup = spec.ramp

Do you do this because in that example, you are doing transfer learning from ffhq256?

I don't think it matters here, due to point n°1 above.

In my case, I would like to do from scratch training on my own dataset, so I would use --cfg auto for the initial training, but then --cfg auto_norp when I resume?

This is your decision here. You could use either auto or auto_norp for the initial training. cf. point n°2 above

Or is the auto_norp config you added not related to resume at all? (and in the example you simply did not want to use EMA ramp-up at all?

The auto_norp config is intended to be a minor variation of the auto config. This can be useful if you train your model from scratch. For instance, except for CIFAR, Nvidia disabled EMA ramp-up in their experiments. cf. point n°2 above

On a side-note, beware that parameters which are automatically set are not that great. For instance:

you would not find the optimal parameters used in the paper's experiments if you used auto,
I would suggest to try to manually adjust gamma, e.g. try to increase it if you have a small dataset.

cf. the README which I quote below:

--gamma=10 overrides R1 gamma.
We recommend trying a couple of different values [for R1 gamma] for each new dataset.

You will need some computational resources to explore the parameter space.

estan · 2021-07-24T20:17:49Z

Thanks a lot for the explanation and tips @woctezuma.

thusinh1969 · 2021-08-18T09:15:44Z

It costed me 10,000 USD already, still NOT CONVERGENCE !!!

woctezuma · 2021-08-18T10:11:18Z

Don't spend so much money! You are not guaranteed that it would work! It depends on data, parameters, etc.

ekerstein · 2021-09-20T06:36:11Z

This branch is very helpful, thank you 🙏

@woctezuma How would you tune this for transfer learning? I want to "resume" from an existing network file but use a new dataset. Do I then need to adjust the augmentation strength or anything else? Any recommendations?

@timothybrooks

…mple_gradfix to the new API thanks @timothybrooks for the fix! for NVlabs#145

Adapt to newer _jit_get_operation API that changed in pytorch/pytorch#76814 for NVlabs#188, NVlabs#193

This was referenced Feb 1, 2021

Resume from the latest pickle #1

Closed

Resume from the latest pickle NVlabs/stylegan2-ada#6

Closed

ink1 mentioned this pull request Apr 10, 2021

Fine tuning losses #89

Closed

woctezuma mentioned this pull request May 1, 2021

Training resumes not as expected #81

Open

snakch pushed a commit to snakch/stylegan2-ada-pytorch that referenced this pull request Jun 20, 2021

Merge pull request NVlabs#3 from pbizimis/main

182a62c

Add weight layer to cff

darrelfrancis mentioned this pull request Aug 22, 2021

upfirdn2d_plugin Problem #39

Closed

thomas-riccardi mentioned this pull request Sep 30, 2021

Training with Top-k fails on multi GPU dvschultz/stylegan2-ada-pytorch#19

Open

woctezuma mentioned this pull request Dec 14, 2021

Network snapshot saving in Colab not working autonomousvision/projected-gan#24

Closed

woctezuma force-pushed the google-colab branch from ab29705 to 362752f Compare November 6, 2023 19:30

jannehellsten and others added 13 commits January 10, 2024 13:47

pytorch 1.11 support: don't use conv2d_gradfix on v1.11, port grid_sa…

e74827f

…mple_gradfix to the new API thanks @timothybrooks for the fix! for NVlabs#145

Fix custom ops bug for pytorch 1.12 and onwards

4397f15

Adapt to newer _jit_get_operation API that changed in pytorch/pytorch#76814 for NVlabs#188, NVlabs#193

Support newer versions of PyTorch (v1.1X and v2)

471c7cb

Save output as JPG instead of PNG

edc9f16

Add utility functions

bd645cf

Resume from the latest pickle

61bc29c

Automatically set the resume value of kimg

7eeedaf

Allow to manually set the resume value of the augmentation strength

0e4e93e

Automatically set the resume value of augment_p

7e29bb5

Add cfg (auto_norp): auto cfg without EMA rampup

2b82b6a

Allow to override mapping net depth with --cfg_map

a51de58

Allow to enforce CIFAR-specific architecture tuning with --cifar_tune

f83452f

README: add changelog

2c507a6

woctezuma force-pushed the google-colab branch from c6dfbc0 to 2c507a6 Compare January 10, 2024 13:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality-of-Life for Google Colab #3

Quality-of-Life for Google Colab #3

woctezuma commented Feb 1, 2021

estan commented Jul 24, 2021

woctezuma commented Jul 24, 2021 •

edited

Loading

estan commented Jul 24, 2021

thusinh1969 commented Aug 18, 2021

woctezuma commented Aug 18, 2021

ekerstein commented Sep 20, 2021 •

edited

Loading

Quality-of-Life for Google Colab #3

Are you sure you want to change the base?

Quality-of-Life for Google Colab #3

Conversation

woctezuma commented Feb 1, 2021

estan commented Jul 24, 2021

woctezuma commented Jul 24, 2021 • edited Loading

estan commented Jul 24, 2021

thusinh1969 commented Aug 18, 2021

woctezuma commented Aug 18, 2021

ekerstein commented Sep 20, 2021 • edited Loading

woctezuma commented Jul 24, 2021 •

edited

Loading

ekerstein commented Sep 20, 2021 •

edited

Loading