Based on @terrychenism's caffe-windows-cudnn with the following major changes.
Linux: Have a look at @Senecaur's version here.
Note: This implementation here is for my project in Kaggle's National Data Science Bowl. So, some choices in the code maybe specifc to the problem, and don't represent the general one, e.g., stochastic prediction as mentioned below.
I have put one of my model for Kaggle's National Data Science Bowl in /examples/kaggle-bowl
.
This is modified from the Princeton's GoogLeNet patch.
To use this layer, you have to convert the image to compact version of leveldb after building /bin/convert_imageset_compact.exe
(the usage is the same with /bin/convert_imageset.exe
).
Since the image can be of varying sizes, it might be problem when computing the mean image for this layer. I use the following method for this issue and it works ok.
-
decide the final image size (
crop_size
as mentioned below) you want to input to the net, say32x32
-
use
/bin/convert_imageset.exe
to pack the image in normal leveldb format with resizing option on, e.g.,./bin/convert_imageset.exe \ --resize_height=32 \ --resize_width=32 \ --gray \ path-to-image-folder \ path-to-image-list \ path-to-leveldb-32x32
-
use
/bin/compute_image_mean.exe
to compute the mean image, e.g../bin/compute_image_mean.exe path-to-leveldb-32x32 path-to-image-mean-32x32
In this code, I turn off the iscolor
flag in the function call cvDecodeImage
in this line and this line. As a result, this layer will convert every image to grayscale. If you want color one, you can set iscolor
to 1
.
Realtime data augmentation is implemented within the COMPACT_DATA
layer. It offers:
- Geometric transform: random flipping, cropping, resizing, rotation, shearing, perspective warpping
- Smooth filtering
- JPEG compression
- Contrast & brightness adjustment
- new can be added via OpenCV utils
To use it, you can specify
## Training set
layers {
name: "Image"
type: COMPACT_DATA
top: "data"
top: "label"
data_param {
source: "path-to-training-compact-leveldb"
batch_size: 100
}
transform_param {
mean_file: "path-to-image-mean"
mirror: true
crop_size: 32
multiscale: true
debug_display: false
smooth_filtering: false
jpeg_compression: false
contrast_adjustment: false
min_scaling_factor: 0.8
max_scaling_factor: 1.2
angle_interval: 45
max_shearing_ratio: 0.1
max_perspective_ratio: 0.1
warp_fillval: 255
}
include: { phase: TRAIN }
}
## Validation set
layers {
name: "Image"
type: COMPACT_DATA
top: "data"
top: "label"
data_param {
source: "path-to-validation-compact-leveldb"
batch_size: 100
}
transform_param {
mean_file: "path-to-image-mean"
mirror: true
crop_size: 32
multiscale: true
debug_display: false
smooth_filtering: false
jpeg_compression: false
contrast_adjustment: false
min_scaling_factor: 0.8
max_scaling_factor: 1.2
angle_interval: 45
max_shearing_ratio: 0.1
max_perspective_ratio: 0.1
warp_fillval: 255
}
include: { phase: TEST }
}
There is an example using realtime data augmentation for Kaggle's National Data Science Bowl in /examples/kaggle-bowl
.
Transformations parameter accepts parameters:
mirror
: horizontal, vertical flipping or both simultaneouslycrop_size
: the final size of the image input to the net (after geometric tranformations, the image will be "resized" to this size); This param has somewhat different meaning than in Caffe, but they both refer to the final size input to the net. In this code, cropping is carried out to simulate resizing; please see the explanation for the params min_scaling_factor & max_scaling_factor below.multiscale
: to enable realtime data augmentation (param kept from the Princeton's GoogLeNet patch)debug_display
: display the distorted image and some info for debugging purposesmooth_filtering
: apply soomth filtering with varying filters (the choice of filters is currently hard coded but you can expose it)jpeg_compression
: apply JPEG compression with varying QFs (the choice of QFs is currently hard coded but you can expose it)contrast_adjustment
: apply contrast & brightness adjustment (the choice of the param, ie.,alpha
andbeta
is currently hard coded but you can expose it)min_scaling_factor
andmax_scaling_factor
: perform random resizing with scaling factor uniformly sampled from[min_scaling_factor, max_scaling_factor]
. Since in the end, we will resize the image to a fixed size (i.e.crop_size
), so scaling the image will not make any difference. Therefore, we use cropping and padding to simulate scaling effect.- For scaling factor > 1, we random crop the original image to simulate scaling up
- For scaling factor < 1, we random pad the original image to simulate scaling down
angle_interval
: perform random rotation with angle uniformly sampled from[0, 360]
with stepangle_interval
max_shearing_ratio
: perform random shearing with ratio uniformly sampled from[-max_shearing_ratio, max_shearing_ratio]
max_perspective_ratio
: perform random perspective warpping with ratio uniformly sampled from[-max_perspective_ratio, max_perspective_ratio]
warp_fillval
: value to fill the border pixels
Here is a concrete example about the geometric transformation. In the above prototxt config, let's say the net encounter an image with original size 48x60
, and the scaling factor for h(eight) and w(idth) direction is randomly sampled as 0.8
and 1.2
, which corresponds to a ROI of size 60x50
(h: 48/0.8=60
, w: 60/1.2=50
). In this case, for h direction, we will randomly pad
additional 12
pixels in both side (these pixels will be set to warp_fillval
); and for w direction, will randomly crop
out extra 10
pixels on both side. With the resulted 60x50
ROI, we will perform random rotation/shearing/perspective warpping in combination using the function warpPerspectiveOneGo
in /src/caffe/util/opencv_util.cpp
. The output will then be a transformed image of size 32x32
. This is the image we feed to the net.
For a better understanding of the transformation augmentation and the above params, please see /src/caffe/data_transformer.cpp
(the transformation is implemented here) and /src/caffe/proto/caffe.proto
.
For transformation augmentation for image classification, I would like to recommend this paper: Transformation Pursuit for Image Classification. The authors have a project page for it.
In this implemetnation, realtime augementation is always on in both TRAIN
and TEST
phase (even the mirror
operation which is disabled in Caffe version). This suits the need for ensemble: you can run the trained model with the same input image a few times and average those predictions (they won't be the same due to random distortions) to get the final one.
If you want deterministic prediction, you can hack the code or using something like:
## Validation set
layers {
name: "Image"
type: COMPACT_DATA
top: "data"
top: "label"
data_param {
source: "path-to-validation-compact-leveldb"
batch_size: 100
}
transform_param {
mean_file: "path-to-image-mean"
mirror: false
crop_size: 32
multiscale: true
debug_display: false
smooth_filtering: false
jpeg_compression: false
contrast_adjustment: false
min_scaling_factor: 1
max_scaling_factor: 1
angle_interval: 360
max_shearing_ratio: 0
max_perspective_ratio: 0
warp_fillval: 255
}
include: { phase: TEST }
}
Note the random mirroring is still on ;)
It is within the same /bin/caffe.exe
interface and usage is as follow:
# make prediction
./bin/caffe.exe predict \
--model=path-to-model-prototxt \
--weights=path-to-trained-model \
--outfile=path-to-output-prediction \
--label_number=number-of-label \
--iterations=iteration-to-run \
--score_index=which-score-to-output \
--gpu=gpu-id \
--random_seed=random-seed \
--phase=TRAIN-or-TEST
Batch Normalization is from here.
This implementation has be adopted in this PR to Caffe (with improvements such as per mini-batch shuffling).
Two additional blobs (besides those for the learnable parameters) are used for storing moving average mean and variance. So set the corresponding blobs_lr
and weight_decay
both to 0
, as follows:
## BN
layers {
bottom: "conv1"
top: "conv1_bn"
name: "conv1_bn"
type: BN
blobs_lr: 1
blobs_lr: 1
blobs_lr: 0
blobs_lr: 0
weight_decay: 0
weight_decay: 0
weight_decay: 0
weight_decay: 0
bn_param {
scale_filler {
type: "constant"
value: 1
}
shift_filler {
type: "constant"
value: 0
}
var_eps: 1e-10
moving_average: true
decay: 0.95
}
}
There is an example for BN using mnist in /examples/mnist
.
BN parameter accepts parameters:
scale_filler
: filler for thescale
parameter (gamma
in the paper)shift_filler
: filler for theshift
parameter (beta
in the paper)var_eps
: eps to add to the variance to avoid dividing zero (epsilon
in the paper)moving_average
: whether or not using exponentially weighted moving average (EWMA) statistics (computed with samples in TRAIN phase) for inferencedecay
: decay (discount) factor for EWMA,S_{t+1} = decay * S_{t} + (1 - decay) * Y_{t+1}
If you want minibatch statistics for inference, set moving_average
to false
.
PReLU is adopted from this PR to Caffe.
AdaDelta is based on this PR to Caffe with a modification to allow learning rate policy as usual.
Adopted from Princeton's GoogLeNet patch.