This is a simple repository meant to document current software and websites for using and training models
- Enhance Everything!: A community focused on training models for various purposes, including game upscaling, anime, photos, and more! Look in the
#training
channel's pinned messages for a bunch more info - neosr: This is the official neosr server. Ask here about any problems you run into and for general discussion of neosr.
- chaiNNer: This is a tool that can be used to degrade datasets as well, however it also supports many other functions, including using the models you've trained
- VideoJaNai: Allows for fast video upscaling utilizing TensorRT (On Nvidia cards), DirectML, or NCNN within a clean GUI. Only supports ONNX. Use chaiNNer to convert pth models to ONNX for usage.
- OpenModelDB: This website contains a bunch of community trained models. You can use these models as they are, or use them as pretrains for your own model
- Model Releases Channels of the discord communities mentioned above
- the database's traiNNer-redux: This training software is intended to be as stable as possible & is consistently updated. It has easy to use configuration files & is easy to set up. It also has most of the latest developments
- sudo's traiNNer: This is custom training software supports a significant amount of losses and architectures, and is very versatile. However, it is much more complex to use
Other training software (not recommended)
-
neosr: NeoSR is intended to be cutting edge and provide the latest features. However, multiple users have reported slowdowns and instability in recent months. In addition, the project's contentious leadership and selective responsiveness to user reports renders it a poor choice overall.
- Due to changes made by the author, many models produced by neosr aren't fully compatible with chaiNNer or other software. Results are different if inferenced within neosr vs any other program. See this issue. Additionally, you can view incompatible architectures here
- @Phhofm still recommends neosr for its feature set and cutting edge implementations
-
traiNNer-redux: traiNNer-redux is a now unmaintained fork of BasicSR. It added losses such as color and contextual loss
-
traiNNer-redux-FJ: A fork of traiNNer-redux by @FloatingJoy#0260 that has additional arch support
-
BasicSR: The official training software for many architectures such as ESRGAN and SwinIR
-
Real-ESRGAN: Very similar to BasicSR with a focus on the Real-ESRGAN architecture, including compact models
-
KAIR: Multifunctional training software that supports many arches
-
neosr-extended: NeoSR-extended is a custom fork of NeoSR that adds additional functions/features. It is no longer maintained.
- wtp_dataset_destroyer: A spiritual successor to Kim's Dataset Destroyer, essentially replacing it. This has more features and is easier to work with, and is updated regularly.
- Kim's Helpful Scripts: This is a collection of scripts I've made to assist with using training software. This includes:
- A script to efficiently tile your datasets to speed up training
- a script to filter out useless/empty images
- and more!
- czkawka: A tool to find and filter out similiar images based on a similarity index, or to find duplicated or corrupt files.
- ImgAlign: This is a great tool to automatically align your datasets. It supports AI based image alignment, which is a great feature for difficult image pairs
- img-ab: An image comparison tool that is lightweight and efficient with a lot of options. This can be helpful when determining progress on your model
- Simple Image Compare: A basic tool for comparing images
- Image Pearer: This tool will create image pairs from a given source. It looks at two folders of images (one HR, one LR), and automatically matches them up
- IQA-PyTorch: A quality metrics tool, which can be used to compare the quality of outputs of different models or checkpoints
- Kim2091's traiNNer-redux Guide (Thanks @Sirosky for the baseline!)
- Sirosky's Upscale-Hub
- Sirosky's NeoSR Guide
- Youtube Vid: Preparing, training and releasing a model with neosr
- Youtube Vid: Training an image upscaling model
- Youtube Vid: Training a model with Real-ESRGAN Compact
- Youtube Vid: Dataset Preparation for Training Upscaling Models
- How to Train/Finetune Real-ESRGAN (This one is a bit dated, but still contains useful info, and a multiscale script)
Descriptions of each loss available in most training software
Here is a brief summary of some of the loss functions that are used for super resolution and image restoration tasks. Thanks to korvin for the info!-
L1Loss: This is the mean absolute error (MAE) between the predicted and target images. It measures the average pixel-wise difference, and is simple to implement and fast to compute. However, it may produce blurry results and does not account for perceptual quality or high-frequency details. It can be used for any type of image. For example, it is suitable for low-level tasks such as denoising or inpainting, but also super resolution. It can be combined with other losses such as perceptual loss or GAN loss to improve the results.
-
LRGBLoss: This is a variant of L1Loss that computes the MAE separately for each color channel (red, green, blue) and then averages them. It is similar to L1Loss in terms of advantages and disadvantages, but it may be more sensitive to color differences. It can be used for any type of images, but it may not be optimal for grayscale images or images with different color spaces3.
-
PerceptualLoss: This is a loss function that uses a pre-trained network, such as VGG, to extract high-level features from the predicted and target images and then computes the MAE (or other measures) between them. It aims to capture the perceptual similarity and semantic content of the images, rather than the pixel-wise difference. It can produce more natural and realistic results, especially for high-level tasks such as super resolution or style transfer. However, it is computationally expensive, requires regularization and hyper-parameter tuning, and involves a large network trained on an unrelated task. It can be used for any type of images, but it may not be optimal for low-level tasks or images with different domains24.
-
ContextualLoss: This is a loss function that measures the similarity between two images based on the distribution of local patches. It uses a cosine similarity metric to compare the patches and then aggregates them using a generalized mean function. It can capture both global and local structures, as well as texture and style information. It can produce more diverse and detailed results, especially for texture synthesis or style transfer. However, it is computationally expensive, requires patch size selection and normalization, and may not be robust to geometric transformations or occlusions. It can be used for any type of images, but it may not be optimal for images with large variations or complex semantics5.
-
ColorLoss: There are many types of color loss. An explicit example would be this: a loss function that measures the color difference between two images using the CIEDE2000 formula, which is based on the human perception of color and accounts for factors such as luminance, hue, chroma, and contrast. It can produce more accurate and consistent color reproduction, especially for color enhancement or correction. However, it is computationally expensive, requires color space conversion and calibration, and may not capture other aspects of image quality such as sharpness or noise. It can be used for any type of images, but it may not be optimal for grayscale images or images with different color spaces.
-
AverageLoss: This is a loss function that computes the average of multiple loss functions, such as L1Loss, PerceptualLoss, ColorLoss, etc. It can combine the advantages of different losses and balance their trade-offs. It can produce more comprehensive and satisfactory results, especially for multi-objective tasks such as super resolution with color enhancement. It is very lightweight, but with some implementations can require fine-tuning. It can be used for any type of images, but it may not be optimal for single-objective tasks or tasks with conflicting objectives.
-
GANLoss: This is a loss function that uses a generative adversarial network (GAN) to discriminate between the predicted and target images. It aims to fool the discriminator network into thinking that the predicted image is real and indistinguishable from the target image. It can produce more sharp and realistic results, especially for high-level tasks such as super resolution or style transfer. However, it is computationally expensive, requires careful design and training of the discriminator network, and may suffer from instability or mode collapse issues. It can be used for any type of images, but it may not be optimal for low-level tasks or tasks with limited data.
Loss information, where to aim to have your loss values:
- Most Losses:: Aim for a value of 0. Lower is better.
- GAN: Ideal value varies with implementation.
- SSIM: Aim for a value of 1. Higher is better.
Metrics:
- PSNR: No specific target value. Higher is better.
Example:
- A loss value of 4.1821e-04 (0.00041821 in decimal) is better than 4.1821e-01 (0.41821) for the main losses. A value closer to 0 is ideal in this scenario.
- A loss value of 2.5325e+03 (2532.5 in decimal) is considered bad, as it's very high. You should tweak your config accordingly.
- PapersWithCode: A place to find the latest research papers released on singe image super resolution
- PapersWithCode Datasets: A collection of datasets that can be used for single image super resolution
- Awesome-Super-Resolution: A github repo where the latest sisr networks are posted, but also includes sisr datasets
- Awesome-diffusion-model-for-image-processing: A repo specifically of diffusion-based super resolution (category) networks