This page introduces the layer related configurations of cxxnet.
All layer configurations comes into
netconfig = start
layer[from->to] = layer_type:name
netconfig = end
- from is the from node name, 0 means input data
- to is the to node name.
- layer_type is described below
- name is an optional, but if you need to finetune the network to other task, name is a must, since it is used to indicate which layer to be copied.
Fully_Connected_Layers and Convolution_Layers require random weight initialization. We provide two initialization methods: gaussian and xaview:
random_type = gaussian
init_sigma = 0.01
We extra provide Xavier initialization method[1], by using the configuration
random_type = xavier
Global setting can be override in the layer configuration, eg
# global setting
random_type = gaussian
netconfig = start
wmat:lr = 0.01
wmat:wd = 0.0005
bias:wd = 0.000
bias:lr = 0.02
layer[0->1] = fullc:fc1
# local setting start
nhidden = 50
random_type = xavier
# local setting end
layer[1->2] = relu
layer[2-3] = fullc
# local setting start
nhidden = 6
init_sigma = 0.005
wmat:lr = 0.1
# local setting end
netconfig = end
By using this configuration, the fc1
layer will use Xavier method to initialize, while fully connected layer without name will use Gaussian random number with mu=0
, sigma=0.005
to do initialization. Meanwhile fully connected layer without name will use a learning rate different with global.
=
= Connection Layer
= Activation Layer
= Loss Layer
= Computation Layers
= Pooling Layers
= Other Layers
=
- Flatten Layer is used for flatten convolution layer. After flattening, we can use convolution output in the feed forward neural network. Namely, the shape of the output node is transformed to (batch, 1, 1, num_feature) instead of (batch, channel, width, height). Here is an example:
layer[15->16] = flatten
- Split Layer is used for one-to-multi connection. It duplicate the input node in forward pass, and accumulated the gradient from output nodes in backward pass.
layer[15->16,17] = split
- Concat Layer is used to concatenate the last dimension (namely, num_feature) of the output of two nodes. It is usually used along with fully connected layer.
layer[18,19->20] = concat
- Channel Concat Layer is used to concatenate the second dimension (namely, channel) of the output of two nodes. It is usually used along with convolution layer.
layer[18,19->20] = ch_concat
=
We provide common active layers including , Rectified Linear (RELU), Sigmoid , Tanh and Parametric_RELU (pRELU).
=
- The output of Rectified Linear is max(0, x). This is the most commonly used activation function in modern deep learning method.
layer[15->16] = relu
=
- Tanh uses the tanh as activation function. It transforms the input into range [-1, 1].
layer[15->16] = tanh
=
- Sigmoid uses the sigmoid as activation function. It transforms the input into range [0, 1].
layer[15->16] = sigmoid
=
- pRELU is basically the implementation of [2]. In addition, we provide a parameter to add noise to the negative slope to reduce overfitting.
layer[15->16] = prelu
random=0.5
- random[optional] denotes standard deviation of the gaussian distribution randomly added to the negative part of pRELU. In testing, this noise part is discarded.
=
Loss layers are self-looped layer. It defines the loss function for training.
- Common Parameters:
- grad_scale[optional]: scale the gradient generated by loss layer
=
- Softmax Loss Layer is the implementation of multi-class softmax loss function.
=
- Euclidean Loss Layer is the implementation of elementwise l2 loss function.
=
- Elementwise Logistic Loss Layer is the implementation of elementwise logistic loss function. It is suitable to multi-label classification problem.
=
- Fully Connection Layer fully connection layer is the basic element in feed forward neural network.
layer[18->19] = fullc
nhidden = 1024
- nhidden denotes the number of hidden units in the layer.
=
If built with CuDNN, the default convolution is CuDNN R2. If there is no CuDNN R2, convolution will be run on our own kernel. The configuration looks like
layer[0->1] = conv
kernel_size = 11
stride = 4
nchannel = 96
pad = 1
- kernel_size is the convolution kernel size
- stride is stride for convolution operation
- nchannel is the output channel
- pad is the number of pad
- temp_col_max[optional] is the maximum size of expanding in convolution operation. The default value is 64, means the maximum size of temp_col is 64MB. Adjusting this variable may boost speed in training especially the input size is small in the convolution network. Note that this will only take effect when not using CuDNN.
=
Currectly we provide three Pooling methods: Sum Pooling , Max Pooling and Average Pooling . All pooling layers shared same parameters: stride and kernel_size
=
- Sum Pooling sums up the values in the pooling region as result , eg
layer[4->5] = sum_pooling
kernel_size = 3
stride = 2
- Max Pooling takes the maximum value in the pooling region as result, eg
layer[4->5] = max_pooling
kernel_size = 3
stride = 2
- Average Pooling averages the values in the pooling region as result , eg
layer[4->5] = avg_pooling
kernel_size = 3
stride = 2
=
- Note that Dropout Layer is a self loop layer. You need to set to equal the from, eg
layer[3->3] = dropout:dp
threshold = 0.5
- threshold is the probability to drop an output.
=
LRN normalizes the response of nearby kernels. Details can be found in the Alex's paper[3].
layer[3->4] = lrn
local_size = 5
alpha = 0.001
beta = 0.75
knorm = 1
- local_size denotes the nearby kernel size to be evaluated
- alpha, beta and knorm is normalization param.
=
BN layer is an implementation of [4]. The difference is that in testing, we only use the mini-batch statistics instead of global statistics in training data as in original paper. It is an experimental layer that may not stable. To use the layer, you need to set
layer[3->4] = batch_norm
There is no parameter for this layer.
=
[1] Glorot Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." AISTATS. 2010.
[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." arXiv preprint arXiv:1502.01852. 2015.
[3] Krizhevsky Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." NIPS. 2012.
[4] Ioffe Sergey, and Christian Szegedy. "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift." arXiv preprint arXiv:1502.03167. 2015.