CUB Memory Manager + cuDNN v4 and v5 support #3919

drnikolaev · 2016-03-30T23:59:20Z

This PR adds two non-separatable features to Caffe: high performance CUB Memory Manager and long awaited upgrade from cuDNN v3 to cuDNN v4 (and upcoming v5) libraries.

seanbell · 2016-03-31T13:11:32Z

Right now Github refuses to render the entire diff (too many changes), so the only thing that one can see are the added 3rdparty files. A workaround could be to add all 3rdparty files in one commit, and put all other changes in a second commit.

drnikolaev · 2016-03-31T17:04:24Z

OK, thank you. Split into 2 commits.

ajtulloch · 2016-04-02T18:46:07Z

include/caffe/layer.hpp

@@ -316,6 +317,21 @@ class Layer {
    param_propagate_down_[param_id] = value;
  }

+  bool IsForwardPassed() const {


Is it possible to not have these as a part of Layer? AFAICT you can have exactly the same effect by just putting these forward_passed/backward_passed as instance variables in CuDNNBatchNormalizationLayer, and you don't add (essentially) completely unused base functions to a core class.

IMO it's preferable to keep the core classes like Layer, Blob as small as possible.

This is part of the annoyance of Caffe's lazy allocation. There are allocations that happen late, specifically setup of cuRand, so we need to be able to delay final setup of algorithm choice until we know how much memory we really have. (There is an upcoming PR to move to the findEx paths in cudnnv5 instead of get that make this even more challenging). We think we need this more generally. Would love a better solution, but we start going down a "plan" style path pretty quickly.

Andrew, there are two reasons for keeping them in base class:

In Net::ForwardFromTo and Net::BackwardFromTo we call setters using pointers to base classes (i.e. pure virtuals would be even more expensive here).

Most probably there will be more use cases like this in other cuDNN-based layer implementations.

Unless I've misunderstood, it's hard to believe that there is any real performance implication due to 1.

If I'm imagining it correctly, what @thatguymike is calling a '"plan" style path' sounds right to me, though admittedly difficult to implement in current Caffe. [In fact, I have code of my own, (not as a Caffe branch) that takes that path, so it might be nice to discuss that elsewhere.]

Currently, however, this is a hack. Note that in current usage, memory usage can change (potentially drastically) after the initial forward-backward pass due to reshaping. So it's better not to push such things into the core of Caffe. If it becomes useful to share this kind of code among cuDNN layers, that can be done with another (intermediate) class, or some other kind of helper. This is also a (minor?) violation of modularity; layers are not really supposed to know anything about other layers or nets or whoever is running them.

I agree with @ajtulloch here -- I see no compelling reason to add these functions to all layers.

@longjon Now please look at net.cpp:

for (int i = start; i <= end; ++i) {

layers_[i]->ForwardPassed(true);

}

Here layers[i] is a pointer to the base class Layer. How would we call ForwardPassed here?

You can't, of course. Instead, you'd need to set forward_passed_ in layer code, presumably at the end of Forward_*.

(Perhaps the intention here was to keep track not simply of whether an individual layer's forward has completed, but whether all layers' forwards' have completed. But note that that's not what this code does anyway.)

cuihenggang · 2016-04-05T23:10:50Z

That's cool that we can do batch normalization using cuDNN. So when we use the CudnnBatchNorm layer, do we still need to append a Scale layer after it?

borisgin · 2016-04-05T23:24:00Z

No you don;t have to add scale/shift layers. CuDNN BN layer has both scale and shift inside.

drnikolaev · 2016-04-07T01:08:12Z

Removed unnecessary files from 3rdparty directory. Just 5 of them left - those are the only required.

antran89 · 2016-04-07T07:04:20Z

Cool! How do I make changes to use this module? Do I need to add a new Layer before any ReLU layer? Or it will automatically do BN when I specify a variable in protxt file.

mfernezir · 2016-04-07T21:47:04Z

Regarding BN layer usage. I've recently posted in NVIDIA/DIGITS#629 about some differences between NVIDIA Caffe and BVLC version.

Since this PR has the same BatchNormParameter message in caffe.proto like the current NVIDIA's one, the following example should work in BVLC Caffe as well:

## BatchNorm
layer {
  bottom: "conv1/7x7_s2"
  name: "conv1/7x7_s2/bn"
  top: "conv1/7x7_s2/bn"
  type: "BatchNorm"
  param {
    lr_mult: 1
    decay_mult: 0
  }
  param {
    lr_mult: 1
    decay_mult: 0
  }
  batch_norm_param {
    scale_filler {
      type: "constant"
      value: 1
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  bottom: "conv1/7x7_s2/bn"
  top: "conv1/7x7_s2/bn"
  name: "conv1/relu_7x7"
  type: "ReLU"
}

There's no need to specify use_global_stats since Cafffe automatically infers the correct state TEST or TRAIN. The same layer definition is used for both train_val and deploy prototxt.

drnikolaev · 2016-04-13T21:01:45Z

Added "Returned shift and scale back to BN layer" commit to make BN layer implementation consistent with the paper, cuDNN and other frameworks.

borisgin · 2016-04-14T15:30:07Z

Today If you want to use BN layer without scale and shift, then you can initialize these two parameters with 1 and 0, and set lr and weight decay to 0 in the train_val.prototxt. I can add a new parameter, which can do this automatically.
Boris

On Apr 13, 2016, at 2:03 PM, Sergei Nikolaev [email protected] wrote:

Added "Returned shift and scale back to BN layer" commit to make BN layer implementation consistent with the paper, cuDNN and other frameworks implementation.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub

ajtulloch · 2016-04-17T20:16:31Z

src/caffe/layers/batch_norm_layer.cu

+void BatchNormLayer<Dtype>::compute_sum_per_channel_gpu(int N, int C, int S,
+    const Dtype *x, Dtype *y ) {
+  // assume that x.shape(1)==C
+  Blob<Dtype> temp_NC;


It looks like this was meant to be temp_NC_? Do you want to s/temp_NC/temp_NC_/g in these files?

Originally it was temp_NC_, but then I decided that it can be more safe to allocate temp_NC inside of these 2 functions to avoid potential hidden over-write (for example if I will use temp_NC_ outside) . I will re-check if this have noticable effect on time.

wlike · 2016-04-21T07:30:55Z

src/caffe/layers/batch_norm_layer.cpp

@@ -22,17 +23,29 @@ void BatchNormLayer<Dtype>::LayerSetUp(const vector<Blob<Dtype>*>& bottom,
  if (this->blobs_.size() > 0) {
    LOG(INFO) << "Skipping parameter initialization";
  } else {
-    this->blobs_.resize(3);
+    this->blobs_.resize(5);


Why is 5 blobs? blobs_[4] is not used in the current code. And according to usage of BatchNorm layer in the prototxt file given, there should be 4 blobs.

blobs_[4] is referenced in a few places, but I agree that it looks like it's not used anywhere.

@borisgin can you comment on this?

mathmanu · 2016-09-25T19:07:17Z

If this BN method is used, will I be able to load a cafemodel (for finetuning) trained using the old BN method. I tried this in the nvidia/cafe repo and load of caffemodel exited complaining that the number of blobs doesn't match. This could be a critical need as there are several older caffemodels out there that we need to use.

borisgin · 2016-09-25T20:41:54Z

No. Old BN has 2 blobs: global mean and variance., New one has 5: scale, bias, gloabl_mean, and global_variance, global_counter

mathmanu · 2016-09-25T20:47:35Z

This could be a problem since several popular models become un-usable. (eg. https://github.com/KaimingHe/deep-residual-networks)

Do we really have to break the backward compatibility this way? In the old BN we used to have a separate scaling layer and that was fine.

Kindly re-consider this and try to keep the compatibility.

mathmanu · 2016-09-25T21:00:34Z

Alternatively - you could provide an up gradation method as well - for example, while loading the older model for finetuning, the scale and bias blobs could be forced to one and zero respectively.

I also saw that the shape of the blobs (global_mean, and global_variance) were also different - although they were of same size - this also creates problem.

mathmanu · 2016-09-25T21:11:28Z

I agree that there is merit in what you did - by combining the normalization and scaling.

Yet another easy fix is to make this a different layer - say "BatchNormScale", and keep the older layer as it is as a separate layer for backward compatibility. This is so far the simplest (minimal code changes) solution that I could come up with.

borisgin · 2016-09-25T22:00:58Z

Agree, Adding new layer would be the simplest way.

mathmanu · 2016-09-26T02:08:12Z

So do you think you can do it the name change (and keep the old layer) right away? I would love to use this new layer with CUDNN as CUDNN gives me 2x boost in speed.

mathmanu · 2016-09-26T07:05:49Z

Do you have any convergence issue with CUDNN BatchNorm used in this PR? In the nvidia/caffe, I had to set the engine as CAFFE for BatchNorm to get convergence.

I was struggling with the convergence issue, but finally the following worked for me. Specifying the engine as CAFFE is important. CUDNN BatchNorm doesn't converge for me.

The following is the configuration that I used in nvidia/caffe version. I am posting it here because I think the underlying implementation is same.

layer {
name: "bn2"
bottom: "conv2"
top: "conv2"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CAFFE
}
}

mathmanu · 2016-09-26T07:09:26Z

If the CUDNN BatchNorm converged, that would have given me overall 4x boost in speed, but now with the CAFFE engine for BatchNorm, I get only 2x boost in speed overall!

mathmanu · 2016-09-26T08:43:06Z

I have couple of more comments:

If you change the oder of the blobs to: gloabl_mean, and global_variance, scale, bias, global_counter, then I don't have to specify 4 param fields for lr_mult and decay_mult - but only 2.
If the definition of scale and bias fields in BatchNormParameter is changed to:
optional float scale_filler = 5 [default = 1];
optional float bias_filler = 6 [default = 0];
Then I don't have to specify these also in the prototxt.

These changes will help someone who is trying to use this layer for the first time - apart from saving some space in the prototxt.

mathmanu · 2016-09-26T14:51:58Z

Need urgent help - any suggestions to solve the issue of non convergence with CUDNN BatchNorm? Are CUDNN BatchNorm implementations same in nvcaffe-0.15, nvcaffe-0.16 and this PR? (I tried nvcaffe-0.16) Any suggestions would help.

borisgin · 2016-09-26T14:55:45Z

I will re-check the convergence issue with cuDNN_BN.

On Mon, Sep 26, 2016 at 12:06 AM, mathmanu [email protected] wrote:

Do you have any convergence issue with CUDNN BatchNorm used in this PR? In
the nvidia/caffe, I had to set the engine as CAFFE for BatchNorm to get
convergence.

I was struggling with the convergence issue, but finally the following
worked for me. Specifying the engine as CAFFE is important. CUDNN BatchNorm
doesn't converge for me.

The following is the configuration that I used in nvidia/caffe version. I
am posting it here because I think the underlying implementation is same.

layer {
name: "bn2"
bottom: "conv2"
top: "conv2"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}

batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CAFFE
}
}

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3919 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHMWqeRQiGb9XHjYfgv6FZabKteRaPvHks5qt28RgaJpZM4H8TpB
.

borisgin · 2016-09-26T14:56:25Z

The idea to change parameter order is very good. I will do this.

On Mon, Sep 26, 2016 at 7:55 AM, Boris Ginsburg [email protected]
wrote:

I will re-check the convergence issue with cuDNN_BN.

On Mon, Sep 26, 2016 at 12:06 AM, mathmanu [email protected]
wrote:

Do you have any convergence issue with CUDNN BatchNorm used in this PR?
In the nvidia/caffe, I had to set the engine as CAFFE for BatchNorm to get
convergence.

I was struggling with the convergence issue, but finally the following
worked for me. Specifying the engine as CAFFE is important. CUDNN BatchNorm
doesn't converge for me.

The following is the configuration that I used in nvidia/caffe version. I
am posting it here because I think the underlying implementation is same.

layer {
name: "bn2"
bottom: "conv2"
top: "conv2"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}

batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CAFFE
}
}

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3919 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHMWqeRQiGb9XHjYfgv6FZabKteRaPvHks5qt28RgaJpZM4H8TpB
.

mathmanu · 2016-09-26T14:59:34Z

Thanks. How about the fillers - can you provide default values for them too?

mathmanu · 2016-09-26T15:29:45Z

When you check the convergence issue with cuDNN_BN, kindly do so with a network that has many BN layers. For example ResNet18.

borisgin · 2016-09-26T23:43:34Z

can you send me your solver.ptototxt and train_test.prototxt files for the
model which converg with caffe-engine and diverge with cudnn please?

On Mon, Sep 26, 2016 at 8:30 AM, mathmanu [email protected] wrote:

When you check the convergence issue with cuDNN_BN, kindly do so with a
network that has many BN layers. For example ResNet18.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3919 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHMWqSSO_kee46Wf5EhHH1yBfuHPIzuGks5qt-UngaJpZM4H8TpB
.

mathmanu · 2016-09-27T13:41:15Z

I couldn't attach for some reason. I have copied the combined (solver + train) prototxt below. If you change the BatchNorm engine to CAFFE, it will start to converge.

#Sover parameters
test_iter: 200
test_interval: 1000
test_initialization: true
display: 100
base_lr: 0.01
lr_policy: "multistep"
stepvalue: 25000
stepvalue: 50000
stepvalue: 75000
stepvalue: 100000
gamma: 0.1
max_iter: 125000
momentum: 0.9
weight_decay: 1e-4
regularization_type: "L2" #"L1"
snapshot: 1000
snapshot_prefix: "training/resnet18"
solver_mode: GPU
random_seed: 33

#Net parameters
net_param {

name: "ResNet-18(1024)"

layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}

transform_param {
crop_size: 224
mean_value: 128
mean_value: 128
mean_value: 128
mirror: true
}
data_param {
source: "/user/me/files/data/datasets/object-detect/other/ilsvrc/2012/ilsvrc12_train_lmdb"
batch_size: 64
backend: LMDB
}
}

layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}

transform_param {
crop_size: 224
mean_value: 128
mean_value: 128
mean_value: 128
mirror: false
}
data_param {
source: "/user/me/files/data/datasets/object-detect/other/ilsvrc/2012/ilsvrc12_val_lmdb"
batch_size: 64
backend: LMDB
}
}
layer {
name: "conv1"
bottom: "data"
top: "conv1"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
param { lr_mult: 2 decay_mult: 0 }
convolution_param {
num_output: 64
kernel_size: 7
pad: 3
stride: 2
weight_filler { type: "msra" std: 0.010 }
bias_filler { type: "constant" value: 0 }
}
}
layer {
name: "bn_conv1"
bottom: "conv1"
top: "conv1"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "conv1_relu"
bottom: "conv1"
top: "conv1"
type: "ReLU"
}
layer {
name: "pool1"
bottom: "conv1"
top: "pool1"
type: "Pooling"
pooling_param {
pool: MAX
kernel_size: 3
stride: 2
}
}
layer {
name: "res2a_branch2a"
bottom: "pool1"
top: "res2a_branch2a"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn2a_branch2a"
bottom: "res2a_branch2a"
top: "res2a_branch2a"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res2a_branch2a_relu"
bottom: "res2a_branch2a"
top: "res2a_branch2a"
type: "ReLU"
}
layer {
name: "res2a_branch2b"
bottom: "res2a_branch2a"
top: "res2a_branch2b"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn2a_branch2b"
bottom: "res2a_branch2b"
top: "res2a_branch2b"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res2a"
bottom: "pool1"
bottom: "res2a_branch2b"
top: "res2a"
type: "Eltwise"
}
layer {
name: "res2a_relu"
bottom: "res2a"
top: "res2a"
type: "ReLU"
}
layer {
name: "res2b_branch2a"
bottom: "res2a"
top: "res2b_branch2a"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn2b_branch2a"
bottom: "res2b_branch2a"
top: "res2b_branch2a"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res2b_branch2a_relu"
bottom: "res2b_branch2a"
top: "res2b_branch2a"
type: "ReLU"
}
layer {
name: "res2b_branch2b"
bottom: "res2b_branch2a"
top: "res2b_branch2b"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 64
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn2b_branch2b"
bottom: "res2b_branch2b"
top: "res2b_branch2b"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res2b"
bottom: "res2a"
bottom: "res2b_branch2b"
top: "res2b"
type: "Eltwise"
}
layer {
name: "res2b_relu"
bottom: "res2b"
top: "res2b"
type: "ReLU"
}
layer {
name: "res3a_branch2a"
bottom: "res2b"
top: "res3a_branch2a"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 2
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn3a_branch2a"
bottom: "res3a_branch2a"
top: "res3a_branch2a"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res3a_branch2a_relu"
bottom: "res3a_branch2a"
top: "res3a_branch2a"
type: "ReLU"
}
layer {
name: "res3a_branch2b"
bottom: "res3a_branch2a"
top: "res3a_branch2b"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn3a_branch2b"
bottom: "res3a_branch2b"
top: "res3a_branch2b"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res3a_branch1"
bottom: "res2b"
top: "res3a_branch1"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 128
kernel_size: 1
pad: 0
stride: 2
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn3a_branch1"
bottom: "res3a_branch1"
top: "res3a_branch1"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res3a"
bottom: "res3a_branch1"
bottom: "res3a_branch2b"
top: "res3a"
type: "Eltwise"
}
layer {
name: "res3a_relu"
bottom: "res3a"
top: "res3a"
type: "ReLU"
}
layer {
name: "res3b_branch2a"
bottom: "res3a"
top: "res3b_branch2a"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn3b_branch2a"
bottom: "res3b_branch2a"
top: "res3b_branch2a"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res3b_branch2a_relu"
bottom: "res3b_branch2a"
top: "res3b_branch2a"
type: "ReLU"
}
layer {
name: "res3b_branch2b"
bottom: "res3b_branch2a"
top: "res3b_branch2b"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 128
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn3b_branch2b"
bottom: "res3b_branch2b"
top: "res3b_branch2b"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res3b"
bottom: "res3a"
bottom: "res3b_branch2b"
top: "res3b"
type: "Eltwise"
}
layer {
name: "res3b_relu"
bottom: "res3b"
top: "res3b"
type: "ReLU"
}
layer {
name: "res4a_branch2a"
bottom: "res3b"
top: "res4a_branch2a"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 2
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn4a_branch2a"
bottom: "res4a_branch2a"
top: "res4a_branch2a"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res4a_branch2a_relu"
bottom: "res4a_branch2a"
top: "res4a_branch2a"
type: "ReLU"
}
layer {
name: "res4a_branch2b"
bottom: "res4a_branch2a"
top: "res4a_branch2b"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn4a_branch2b"
bottom: "res4a_branch2b"
top: "res4a_branch2b"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res4a_branch1"
bottom: "res3b"
top: "res4a_branch1"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 256
kernel_size: 1
pad: 0
stride: 2
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn4a_branch1"
bottom: "res4a_branch1"
top: "res4a_branch1"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res4a"
bottom: "res4a_branch1"
bottom: "res4a_branch2b"
top: "res4a"
type: "Eltwise"
}
layer {
name: "res4a_relu"
bottom: "res4a"
top: "res4a"
type: "ReLU"
}
layer {
name: "res4b_branch2a"
bottom: "res4a"
top: "res4b_branch2a"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn4b_branch2a"
bottom: "res4b_branch2a"
top: "res4b_branch2a"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res4b_branch2a_relu"
bottom: "res4b_branch2a"
top: "res4b_branch2a"
type: "ReLU"
}
layer {
name: "res4b_branch2b"
bottom: "res4b_branch2a"
top: "res4b_branch2b"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 256
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn4b_branch2b"
bottom: "res4b_branch2b"
top: "res4b_branch2b"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res4b"
bottom: "res4a"
bottom: "res4b_branch2b"
top: "res4b"
type: "Eltwise"
}
layer {
name: "res4b_relu"
bottom: "res4b"
top: "res4b"
type: "ReLU"
}
layer {
name: "res5a_branch2a"
bottom: "res4b"
top: "res5a_branch2a"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 2
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn5a_branch2a"
bottom: "res5a_branch2a"
top: "res5a_branch2a"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res5a_branch2a_relu"
bottom: "res5a_branch2a"
top: "res5a_branch2a"
type: "ReLU"
}
layer {
name: "res5a_branch2b"
bottom: "res5a_branch2a"
top: "res5a_branch2b"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn5a_branch2b"
bottom: "res5a_branch2b"
top: "res5a_branch2b"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res5a_branch1"
bottom: "res4b"
top: "res5a_branch1"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 512
kernel_size: 1
pad: 0
stride: 2
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn5a_branch1"
bottom: "res5a_branch1"
top: "res5a_branch1"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res5a"
bottom: "res5a_branch1"
bottom: "res5a_branch2b"
top: "res5a"
type: "Eltwise"
}
layer {
name: "res5a_relu"
bottom: "res5a"
top: "res5a"
type: "ReLU"
}
layer {
name: "res5b_branch2a"
bottom: "res5a"
top: "res5b_branch2a"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn5b_branch2a"
bottom: "res5b_branch2a"
top: "res5b_branch2a"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res5b_branch2a_relu"
bottom: "res5b_branch2a"
top: "res5b_branch2a"
type: "ReLU"
}
layer {
name: "res5b_branch2b"
bottom: "res5b_branch2a"
top: "res5b_branch2b"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
convolution_param {
num_output: 512
kernel_size: 3
pad: 1
stride: 1
bias_term: false
weight_filler { type: "msra" std: 0.010 }
dilation: 1
group: 1
}
}
layer {
name: "bn5b_branch2b"
bottom: "res5b_branch2b"
top: "res5b_branch2b"
type: "BatchNorm"
param { #scale
lr_mult: 1
decay_mult: 1
}
param { #shift/bias
lr_mult: 1
decay_mult: 1
}
param { #global mean
lr_mult: 0
decay_mult: 0
}
param { #global var
lr_mult: 0
decay_mult: 0
}
batch_norm_param {
scale_filler {
type: "constant"
value: 1
}
bias_filler {
type: "constant"
value: 0
}
engine: CUDNN
}
}
layer {
name: "res5b"
bottom: "res5a"
bottom: "res5b_branch2b"
top: "res5b"
type: "Eltwise"
}
layer {
name: "res5b_relu"
bottom: "res5b"
top: "res5b"
type: "ReLU"
}
layer {
name: "pool5"
bottom: "res5b"
top: "pool5"
type: "Pooling"
pooling_param {
pool: AVE
kernel_size: 7
stride: 1
}
}
layer {
name: "conv6"
bottom: "pool5"
top: "conv6"
type: "Convolution"
param { lr_mult: 1 decay_mult: 1 }
param { lr_mult: 2 decay_mult: 0 }
convolution_param {
num_output: 1024
kernel_size: 1
pad: 0
stride: 1
weight_filler { type: "msra" std: 0.010 }
bias_filler { type: "constant" value: 0 }
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "conv6"
bottom: "label"
propagate_down: 1
propagate_down: 0
top: "loss"
loss_weight: 1
}

layer {
name: "accuracy"
type: "Accuracy"
bottom: "conv6"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "accuracy/top-5"
type: "Accuracy"
bottom: "conv6"
bottom: "label"
top: "accuracy/top-5"
include {
phase: TEST
}
accuracy_param {
top_k: 5
}
}

}

mathmanu · 2016-09-27T16:22:59Z

Were you able to reproduce the behavior?

borisgin · 2016-09-28T03:12:08Z

I was able to reproduce the problem: caffe engine is converging, but cudnn
BN diverges
Btw, I would add two parameters to BN layer definition:
moving_average_fraction: 0.9
eps: 0.0001

On Tue, Sep 27, 2016 at 9:24 AM, mathmanu [email protected] wrote:

Were you able to reproduce the behavior?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3919 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHMWqQfqa0wtVvY1DtgCwjtLNsQB1TXTks5quUMpgaJpZM4H8TpB
.

mathmanu · 2016-09-28T09:26:13Z

Thankyou so much for confirming. I'll wait for further information from you.

mathmanu · 2016-09-28T13:35:40Z

Btw, I can see that the additional parameters that you mentioned have default values - so technically I don't need to specify them. Do the values suggested by you produce better accuracy?

// How much does the moving average decay each iteration?
optional float moving_average_fraction = 2 [default = .999];
// Small value to add to the variance estimate so that we don't divide by
// zero.
optional float eps = 3 [default = 1e-5];

mathmanu · 2016-09-29T06:22:30Z

Is there a quick fix or a workaround that can solve this issue?
Thanks,

borisgin · 2016-09-30T01:18:44Z

Yes. CUDNN_BN requires different top and bottom since it needs bottom for backward. I attached the fixed train_val.prototxt (I also did a few additional minor changes: change last layer from conv 1x1 to IP with num_of_outputs=1000).

train_val.txt

mathmanu · 2016-09-30T02:28:28Z

Thanks. This helped a lot.

Btw, I have an observation. I have a network trained with the CAFFE BN engine. When I tried to TEST it using CUDNN BN engine, Caffe exited saying that the shapes of blobs in BN mismatch. But since the blobs are same size (but different shapes), I was able to forcefully reshape the blobs and do the TEST - and it gave correct results!

borisgin · 2016-10-02T15:45:20Z

My bug :(

On Thu, Sep 29, 2016 at 7:29 PM, mathmanu [email protected] wrote:

Thanks. This helped a lot.

Btw, I have an observation. I have a network trained with the CAFFE BN
engine. When I tried to TEST it using CUDNN BN engine, Caffe exited saying
that the shapes of blobs in BN mismatch. But since the blobs are same sime
(but different shapes), I was able to forcefully reshape the blobs and do
the TEST - and it gave correct results!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3919 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHMWqcY9KxMmkkoWXywAP9sfDYDUHFFSks5qvHQHgaJpZM4H8TpB
.

mathmanu · 2016-10-03T07:58:53Z

Don't worry - CUDNN is great - it gives me 4x speed boost - that makes a huge difference.

You just need to do slight changes and testing - cross compatibility test to CAFE engine and backward compatibility test.

mathmanu · 2016-11-20T14:13:58Z

See another thread with a similar issue being reported: NVIDIA/DIGITS#629

achaiah · 2016-12-01T19:44:31Z

@borisgin Out of curiosity, have you tried larger networks than Resnet-18 on NVIDIA? Your resnet-18 is the only one that converges for me while I can't find a single example of a larger Resnet that does.

drnikolaev force-pushed the bvlc_cub_v4_v5 branch from d41bfbe to f061e51 Compare March 31, 2016 18:16

ajtulloch reviewed Apr 2, 2016
View reviewed changes

drnikolaev force-pushed the bvlc_cub_v4_v5 branch from f061e51 to 4044ed7 Compare April 5, 2016 22:31

ajtulloch mentioned this pull request Apr 5, 2016

Batch Normalization using cuDNN? #3940

Closed

drnikolaev added 2 commits April 6, 2016 17:57

CUB Memory Manager - just new 3rdparty directory added

5b5b438

CUB Memory Manager + cuDNN v4 and v5 support

c9eda39

drnikolaev force-pushed the bvlc_cub_v4_v5 branch from 4044ed7 to c9eda39 Compare April 7, 2016 01:04

lukeyeager mentioned this pull request Apr 7, 2016

BatchNorm does not converge in digits NVIDIA/DIGITS#629

Open

classner mentioned this pull request Apr 8, 2016

BatchNorm numerical instability #3963

Open

shelhamer added the focus label Apr 8, 2016

seanbell mentioned this pull request Apr 9, 2016

Installation errors with CUDNN v5 #3969

Closed

drnikolaev force-pushed the bvlc_cub_v4_v5 branch from 51bbfa6 to d3285ae Compare April 13, 2016 21:46

drnikolaev mentioned this pull request Apr 14, 2016

ERROR on digits server when using bvlc_cub_v4_v5 drnikolaev/caffe#7

Closed

shelhamer added the ready for review label Apr 14, 2016

seanbell mentioned this pull request Apr 17, 2016

cudnn.hpp can't be compatible to the newest cuDNN v5 library #4002

Closed

ajtulloch reviewed Apr 17, 2016
View reviewed changes

drnikolaev force-pushed the bvlc_cub_v4_v5 branch from d3285ae to 2c4cfbd Compare April 18, 2016 23:04

wlike reviewed Apr 21, 2016
View reviewed changes

achaiah mentioned this pull request Nov 21, 2016

Not training in DIGITS jay-mahadeokar/pynetbuilder#6

Closed

mathmanu mentioned this pull request Nov 23, 2016

Incompatibilities in BatchNorm. NVIDIA/caffe#276

Closed

CUB Memory Manager + cuDNN v4 and v5 support #3919

CUB Memory Manager + cuDNN v4 and v5 support #3919

Conversation

drnikolaev commented Mar 30, 2016

seanbell commented Mar 31, 2016

drnikolaev commented Mar 31, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

drnikolaev May 4, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cuihenggang commented Apr 5, 2016

borisgin commented Apr 5, 2016

drnikolaev commented Apr 7, 2016

antran89 commented Apr 7, 2016

mfernezir commented Apr 7, 2016

drnikolaev commented Apr 13, 2016

borisgin commented Apr 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wlike Apr 21, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mathmanu commented Sep 25, 2016

borisgin commented Sep 25, 2016

mathmanu commented Sep 25, 2016

mathmanu commented Sep 25, 2016 • edited Loading

mathmanu commented Sep 25, 2016

borisgin commented Sep 25, 2016

mathmanu commented Sep 26, 2016

mathmanu commented Sep 26, 2016

mathmanu commented Sep 26, 2016

mathmanu commented Sep 26, 2016

mathmanu commented Sep 26, 2016

borisgin commented Sep 26, 2016

borisgin commented Sep 26, 2016

mathmanu commented Sep 26, 2016

mathmanu commented Sep 26, 2016

borisgin commented Sep 26, 2016

mathmanu commented Sep 27, 2016 • edited Loading

mathmanu commented Sep 27, 2016

borisgin commented Sep 28, 2016

mathmanu commented Sep 28, 2016

mathmanu commented Sep 28, 2016

mathmanu commented Sep 29, 2016

borisgin commented Sep 30, 2016

mathmanu commented Sep 30, 2016 • edited Loading

borisgin commented Oct 2, 2016

mathmanu commented Oct 3, 2016 • edited Loading

mathmanu commented Nov 20, 2016

achaiah commented Dec 1, 2016

drnikolaev May 4, 2016 •

edited

Loading

wlike Apr 21, 2016 •

edited

Loading

mathmanu commented Sep 25, 2016 •

edited

Loading

mathmanu commented Sep 27, 2016 •

edited

Loading

mathmanu commented Sep 30, 2016 •

edited

Loading

mathmanu commented Oct 3, 2016 •

edited

Loading