Skip to content
neosr-project edited this page Oct 24, 2024 · 32 revisions

This page describes all losses and their options currently implemented in neosr.


wavelet_guided, wavelet_init

The wavelet_guided loss enables the use of WGSR. As explained in the paper, the purpose of this is to stability gan training and reduce artifacts. The option wavelet_init specifies the number of iterations before to enable wavelet_guided.

[train]
wavelet_guided = true
wavelet_init = 80000

Note

This loss works better for finetuning than for training from scratch. It is recommended you train the model for at least ~40k before enabling it.


pixel_opt

The pixel_opt option defines the pixel loss.

[train.pixel_opt]
type = "L1Loss"
loss_weight = 1.0
reduction = "mean"

The above option sets pixel loss with L1 criteria and weight of 1.0. Possible values for type are: L1Loss, MSELoss (also known as L2), HuberLoss and chc (Clipped Huber with Cosine Similarity Loss - can improve color consistency and decrease noise, reduction is done using Huber loss).


mssim_opt, mssim_loss

The mssim_opt option defines the Multi-scale SSIM loss. The implementation on neosr has been adapted from "A better pytorch-based implementation for the mean structural similarity. Differentiable simpler SSIM and MS-SSIM.". The options bellow are the defaults when calling mssim function by itself:

[train.mssim_opt]
type = "mssim_loss"
loss_weight = 1.0
window_size = 11
sigma = 1.5
in_channels = 3
K1 = 0.01
K2 = 0.03
L = 1

ncc_opt, ncc_loss

This option sets the NCC loss. It uses Normalized Cross-Correlation.

[train.ncc_opt]
type = "ncc_loss"
loss_weight = 1.0

fdl_opt, fdl_loss

This option sets the Frequency Distribution Loss, which is a perceptual loss.

[train.fdl_opt]
type = "fdl_loss"
model = "vgg" # "resnet", "effnet", "inception"
num_proj = 24
phase_weight = 1.0
loss_weight = 1.0
patch_size = 5
stride = 1
s1_w = 1.0
s2_w = 1.0
s3_w = 1.0
s4_w = 1.0
s5_w = 1.0

This loss uses pretrained network features. Possible networks are "vgg" (19), "resnet" (101), "effnet" (efficientnet v1) and "inception" (v3). The default value for num_proj is set to 24, due to heavy hit on training performance. In the official implementation however, the value 256 is used. You may increase it at the end of a finetuning process to achieve better perceptual quality. The sx_w parameters are the weight for each stage (layer) when using VGG.


perceptual_opt, vgg_perceptual_loss

This option sets the perceptual loss. It uses the VGG19 network to extract features from images.

[train.perceptual_opt]
type = "vgg_perceptual_loss"
loss_weight = 1.0
criterion = "huber"
patchloss = true
ipk = true
patch_weight = 1.0
vgg_type = "vgg19"
use_input_norm = true
range_norm = false
[train.perceptual_opt.layer_weights]
conv1_2 = 0.1
conv2_2 = 0.1
conv3_4 = 1.0
conv4_4 = 1.0
conv5_4 = 1.0

Possible values for criterion are: l1, l2, huber and chc.

The options patchloss, ipk and perceptual_patch_weight specifies to use Patch Loss. By default, those options are disabled. The option patchloss enables Feature Patch Kernel, as described in the paper. The option ipk enables Image Patch Kernel.


dists_opt, dists_loss

This option enables DISTS (vgg16) as a perceptual loss. Can be used in combination with perceptual_opt.

[train.dists_opt]
type = "dists_loss"
loss_weight = 0.5

gan_opt, gan_loss

This option enables GAN training.

[train.gan_opt]
type = "gan_loss"
gan_type = "bce"
loss_weight = 0.3
real_label_val = 1.0
fake_label_val = 0.0

Possible values for gan_type are: bce, mse or huber.


ldl_opt, ldl_loss

This option sets the LDL loss. See the research paper for details.

[train.ldl_opt]
type = "ldl_loss"
loss_weight = 1.0
criterion = "huber"
ksize = 7

Possible values for type are: l1, l2 and huber.


ff_opt, ff_loss

This option sets the Focal-Frequency Loss. See the research paper for details.

[train.ff_opt]
type = "ff_loss"
loss_weight = 1.0
alpha = 1.0
patch_factor = 1
ave_spectrum = true
log_matrix = false
batch_matrix = false

Note

Focal Frequency loss can cause instabilities if enabled without using a pretrained model.


gw_opt, gw_loss

This option specifies to use Gradient-Weighted Loss from the CDC research. In practice, this loss makes the network focus more on high-frequencies.

[train.gw_opt]
type = "gw_loss"
loss_weight = 1.0
criterion = "chc_loss"
corner = true

Possible values for criterion are: l1, l2, huber, and chc.


kl_opt, kl_loss

This option specifies to use the Kullback-Leibler divergence loss.

[train.kl_opt]
type = "kl_loss"
loss_weight = 1.0

Note

KL-loss should only be enabled if using a pretrained model. Enabling it from scratch may cause incorrect results or NaN.


match_lq_colors

This option specifies to match color and luma from your LQ images, instead of the GT images. Can increase stability if your dataset has too much variations in color/luma. Only applicable if consistency_loss is enabled.

[train]
match_lq_colors = true

consistency_opt, consistency_loss

This option sets the color and luma consistency loss. It allows for matching the brightness and colors of your generated images to GT or LQ (see match_lq option). The loss uses Oklab and CIE L* color space transforms, as well as Cosine Similarity.

[train.consistency_opt]
type = "consistency_loss"
loss_weight = 1.0
criterion = "chc" # "l1"
blur = true
cosim = true
saturation = 1.0
brightness = 1.0

msswd_opt, msswd_loss

This option sets Multiscale Sliced Wasserstein Distance loss. It is a color consistency loss.

[train.msswd_opt]
type = "msswd_loss"
num_scale = 3
num_proj = 24
loss_weight = 1.0
patch_size = 11
stride = 1
c = 3

The parameters num_proj and num_scale defaults to 24 and 3, respectively, due to heavy hit on training performance. In the official implementation however, the values 128 and 5 are used. You may increase it at the end of a finetuning process to achieve better perceptual quality.