Skip to content

Hyperparameters

Clemens-Alexander Brust edited this page Feb 18, 2015 · 4 revisions

Training Data and Batches

CN24 can process both parallel and sequential minibatches. To train a minibatch of a certain size in parallel, a corresponding minimum amount of memory is required. If your training requires larger minibatches, you can use sequential minibatches to accumulate the gradients of multiple parallel minibatches, updating the weights after they are processed. Mathematically, parallel and sequential minibatches produce the exact same results.

This mostly affects GPU training. For a specific GPU and network architecture, find the largest parallel batch size so that the VRAM is not completely full. Adjust the sequential batch size for your training needs.

The effective batch size is the product of the sequential and the parallel batch size. Use the following configuration entries to specify the batch sizes:

sbatchsize=10
pbatchsize=2

This will result in minibatches of 20 samples.

An epoch is a specific amount of minibatches. Set this value using the iterations entry:

iterations=100

Learning Rate

CN24 uses a variable learning rate during training. The formula used is the following:

lr(n) = lr0 * (1 + g * n) ^ -q

This formula involves a number of variables and hyperparameters, including:

  • n, the number of the current iteration.
  • lr0, the initial learning rate. Use lr= in the network configuration.
  • g, a coefficient that determines the slope of the variable learning rate. Use gamma= in the network configuration.
  • q, an exponent that determines the shape of the variable learning rate curve. Use exponent= in the network configuration.

CN24 also uses momentum learning. A fraction of the previous learning step is added to every step for faster convergence. Use the momentum config entry to change this behaviour:

momentum=0.9

Regularization

CN24 supports both L1 and L2 regularization. See the following example for a configuration that enables regularization:

l1=0.001
l2=0.0005