-
Notifications
You must be signed in to change notification settings - Fork 31
Optimizer Options
This page describes all optimizers, schedulers and their options currently implemented in neosr
.
See pytorch documentation for all options.
[train.optim_g]
type = "adam"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0
See pytorch documentation for all options.
[train.optim_g]
type = "adamw"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0
See pytorch documentation for all options.
[train.optim_g]
type = "nadam"
lr = 5e-4
betas = [ 0.98, 0.99 ]
weight_decay = 0.01
decoupled_weight_decay = true
[train.optim_g]
type = "adan"
lr = 5e-4
betas = [ 0.98, 0.92, 0.99 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0.0
no_prox = true
forearch = true
[train.optim_g]
type = "adamw_win"
lr = 5e-4
betas = [ 0.98, 0.999 ]
reckless_steps = [ 2.0, 8.0 ]
eps = 1e-8
weight_decay = 0.02
amsgrad = false
max_grad_norm = 0.0
acceleration_mode = "win2" # "win"
[train.optim_g]
type = "adamw_sf"
lr = 8e-4
betas = [ 0.9, 0.99 ]
eps = 1e-8
weight_decay = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true
Note
The parameter schedule_free
MUST be in the configuration file for this optimizer to work as intended. Enabling ema
is recommended. If you wish to use warmup, do so through the option warmup_steps
(iters) instead of warmup_iter
, since the later is implemented outside the optimizer.
[train.optim_g]
type = "adan_sf"
lr = 8e-4
betas = [ 0.98, 0.92, 0.987 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true
Note
The parameter schedule_free
MUST be in the configuration file for this optimizer to work as intended. Enabling ema
is recommended. If you wish to use warmup, do so through the option warmup_steps
(iters) instead of warmup_iter
, since the later is implemented outside the optimizer.
FriendlySAM can be enabled by using the following options:
[train]
sam = "fsam"
sam_init = 1000
Important
When training from scratch and with low batch sizes (less than 8), SAM could cause NaN
. In that case, use sam_init
to start SAM only after N iterations.
When using AMP (automatic mixed precision), be careful. Due to limitations on pytorch's GradScaler, SAM does not scale gradients to appropriate precision ranges, which could lead to NaN
.
MultiStepLR
, multisteplr
[train.scheduler]
type = "multisteplr"
milestones = [ 60000, 120000 ]
gamma = 0.5
This scheduler drops the learning rate by gamma
at each milestones
(iters).
CosineAnnealing
, cosineannealing
[train.scheduler]
type = "cosineannealing"
T_max = 160000
eta_min = 4e-5
This scheduler drops the learning rate to eta_min
until it reaches T_max
(iters).