Skip to content

Optimizer Options

neosr-project edited this page Oct 18, 2024 · 12 revisions

This page describes all optimizers, schedulers and their options currently implemented in neosr.

Optimizers


Adam, adam

See pytorch documentation for all options.

[train.optim_g]
type = "adam"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0

AdamW, adamw

See pytorch documentation for all options.

[train.optim_g]
type = "adamw"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0

NAdam, nadam

See pytorch documentation for all options.

[train.optim_g]
type = "nadam"
lr = 5e-4
betas = [ 0.98, 0.99 ]
weight_decay = 0.01
decoupled_weight_decay = true

Adan, adan

[train.optim_g]
type = "adan"
lr = 5e-4
betas = [ 0.98, 0.92, 0.99 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0.0
no_prox = true
forearch = true

AdamW_Win, adamw_win

[train.optim_g]
type = "adamw_win"
lr = 5e-4
betas = [ 0.98, 0.999 ]
reckless_steps = [ 2.0, 8.0 ]
eps = 1e-8
weight_decay = 0.02
amsgrad = false
max_grad_norm = 0.0
acceleration_mode = "win2" # "win"

AdamW_SF, adamw_sf

[train.optim_g]
type = "adamw_sf"
lr = 8e-4
betas = [ 0.9, 0.99 ]
eps = 1e-8
weight_decay = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true

Note

The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.


Adan_SF, adan_sf

[train.optim_g]
type = "adan_sf"
lr = 8e-4
betas = [ 0.98, 0.92, 0.987 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true

Note

The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.


Sharpness-Aware Minimization

fsam

FriendlySAM can be enabled by using the following options:

[train]
sam = "fsam"
sam_init = 1000

Important

When training from scratch and with low batch sizes (less than 8), SAM could cause NaN. In that case, use sam_init to start SAM only after N iterations. When using AMP (automatic mixed precision), be careful. Due to limitations on pytorch's GradScaler, SAM does not scale gradients to appropriate precision ranges, which could lead to NaN.


Schedulers

MultiStepLR, multisteplr

[train.scheduler]
type = "multisteplr"
milestones = [ 60000, 120000 ]
gamma = 0.5

This scheduler drops the learning rate by gamma at each milestones (iters).


CosineAnnealing, cosineannealing

[train.scheduler]
type = "cosineannealing"
T_max = 160000
eta_min = 4e-5

This scheduler drops the learning rate to eta_min until it reaches T_max (iters).