Optimizer Options

This page describes all optimizers, schedulers and their options currently implemented in neosr.

Optimizers

`Adam`, `adam`

See pytorch documentation for all options.

[train.optim_g]
type = "adam"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0

`AdamW`, `adamw`

See pytorch documentation for all options.

[train.optim_g]
type = "adamw"
lr = 5e-4
betas = [ 0.9, 0.99 ]
weight_decay = 0

`NAdam`, `nadam`

See pytorch documentation for all options.

[train.optim_g]
type = "nadam"
lr = 5e-4
betas = [ 0.98, 0.99 ]
weight_decay = 0.01
decoupled_weight_decay = true

`Adan`, `adan`

[train.optim_g]
type = "adan"
lr = 5e-4
betas = [ 0.98, 0.92, 0.99 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0.0
no_prox = true
forearch = true

`AdamW_Win`, `adamw_win`

[train.optim_g]
type = "adamw_win"
lr = 5e-4
betas = [ 0.98, 0.999 ]
reckless_steps = [ 2.0, 8.0 ]
eps = 1e-8
weight_decay = 0.02
amsgrad = false
max_grad_norm = 0.0
acceleration_mode = "win2" # "win"

`AdamW_SF`, `adamw_sf`

[train.optim_g]
type = "adamw_sf"
lr = 8e-4
betas = [ 0.9, 0.99 ]
eps = 1e-8
weight_decay = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true

Note

The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.

`Adan_SF`, `adan_sf`

[train.optim_g]
type = "adan_sf"
lr = 8e-4
betas = [ 0.98, 0.92, 0.987 ]
eps = 1e-8
weight_decay = 0.02
max_grad_norm = 0
warmup_steps = 0
r = 0
weight_lr_power = 2.0
schedule_free = true

Note

The parameter schedule_free MUST be in the configuration file for this optimizer to work as intended. Enabling ema is recommended. If you wish to use warmup, do so through the option warmup_steps (iters) instead of warmup_iter, since the later is implemented outside the optimizer.

Sharpness-Aware Minimization

`fsam`

FriendlySAM can be enabled by using the following options:

[train]
sam = "fsam"
sam_init = 1000

Important

When training from scratch and with low batch sizes (less than 8), SAM could cause NaN. In that case, use sam_init to start SAM only after N iterations. When using AMP (automatic mixed precision), be careful. Due to limitations on pytorch's GradScaler, SAM does not scale gradients to appropriate precision ranges, which could lead to NaN.

Schedulers

`MultiStepLR`, `multisteplr`

[train.scheduler]
type = "multisteplr"
milestones = [ 60000, 120000 ]
gamma = 0.5

This scheduler drops the learning rate by gamma at each milestones (iters).

`CosineAnnealing`, `cosineannealing`

[train.scheduler]
type = "cosineannealing"
T_max = 160000
eta_min = 4e-5

This scheduler drops the learning rate to eta_min until it reaches T_max (iters).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizer Options

Optimizers

`Adam`, `adam`

`AdamW`, `adamw`

`NAdam`, `nadam`

`Adan`, `adan`

`AdamW_Win`, `adamw_win`

`AdamW_SF`, `adamw_sf`

`Adan_SF`, `adan_sf`

Sharpness-Aware Minimization

`fsam`

Schedulers

`MultiStepLR`, `multisteplr`

`CosineAnnealing`, `cosineannealing`

Clone this wiki locally

Optimizer Options

Optimizers

Adam, adam

AdamW, adamw

NAdam, nadam

Adan, adan

AdamW_Win, adamw_win

AdamW_SF, adamw_sf

Adan_SF, adan_sf

Sharpness-Aware Minimization

fsam

Schedulers

MultiStepLR, multisteplr

CosineAnnealing, cosineannealing

Clone this wiki locally

`Adam`, `adam`

`AdamW`, `adamw`

`NAdam`, `nadam`

`Adan`, `adan`

`AdamW_Win`, `adamw_win`

`AdamW_SF`, `adamw_sf`

`Adan_SF`, `adan_sf`

`fsam`

`MultiStepLR`, `multisteplr`

`CosineAnnealing`, `cosineannealing`