Fixing harsh upgrade_proto for `"BatchNorm"` layer #5184

shaibagon · 2017-01-15T09:04:51Z

This PR attempts to fix issues #5171 and #5120 cuased by PR #4704:
PR#4704 removes completely all param arguments of "BatchNorm" layers, and resetting them to param {lr_mult: 0}. This "upgrade" is too harsh and it discards "name" argument that might be set by user.

This PR fixes upgrade_proto.cpp for "BatchNorm" layer to be more conservative, leave "name" in param, and only set lr_mult and decay_mult to zero.

Example of such upgrade:
Input prototxt

layer {
  type: "BatchNorm"
  name: "bn0"
  bottom: "data"
  top: "bn0"
  # old style params
  param: { lr_mult: 0 }
  param: { lr_mult: 0 }
  param: { lr_mult: 0 }
}
layer {
  type: "BatchNorm"
  name: "bn1"
  bottom: "bn0"
  top: "bn1"
  # wrong params
  param: { lr_mult: 1 decay_mult: 1}
  param: { lr_mult: 1 decay_mult: 0}
  param: { lr_mult: 1 decay_mult: 1}
}
layer {
  type: "BatchNorm"
  name: "bn2"
  bottom: "bn1"
  top: "bn2"
  # no params at all
}
layer {
  type: "BatchNorm"
  name: "bn3"
  bottom: "bn2"
  top: "bn3"
  # wrong with "name"
  param: { lr_mult: 1 decay_mult: 1 name: "bn_m"}
  param: { lr_mult: 1 decay_mult: 1 name: "bn_s"}
  param: { lr_mult: 1 decay_mult: 1 name: "bn_b"}
}
layer {
  type: "BatchNorm"
  name: "bn4"
  bottom: "bn3"
  top: "bn4"
  # only "name"
  param: { name: "bn_m"}
  param: { name: "bn_s"}
  param: { name: "bn_b"}
}

"Upgraded" prorotxt:

layer {
  name: "bn0"
  type: "BatchNorm"
  bottom: "data"
  top: "bn0"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "bn0"
  top: "bn1"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
}
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "bn1"
  top: "bn2"
}
layer {
  name: "bn3"
  type: "BatchNorm"
  bottom: "bn2"
  top: "bn3"
  param {
    name: "bn_m"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_s"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_b"
    lr_mult: 0
    decay_mult: 0
  }
}
layer {
  name: "bn4"
  type: "BatchNorm"
  bottom: "bn3"
  top: "bn4"
  param {
    name: "bn_m"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_s"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_b"
    lr_mult: 0
    decay_mult: 0
  }
}

As you can see lr_mult and decay_mult are set to zero leaving name intact when explicitly set by user.

…"name" in param, only set lr_mult and decay_mult to zero

shaibagon · 2017-01-16T09:54:07Z

@shelhamer would you please have a look at this issue/proposed fix?

Thanks.

shelhamer · 2017-01-20T00:07:48Z

Switching to zeroing the lr_mult and decay_mult like this is fine. I was too focused on avoiding incorrect statistics gradients that I made sharing impossible. Thanks for the fix!

shaibagon · 2017-01-20T08:18:01Z

@shelhamer Thanks for merging This PR!

antran89 · 2017-02-02T03:48:15Z

@shaibagon Thank Shai for a fix. Not sure about internal structure. Just quick question. Does the upgraded proto of BN layer have the same interface as before having this upgrade?

shaibagon · 2017-02-02T05:21:08Z

@antran89 There is no interface change. The actions upgrade_proto takes when encountering "BatchNorm" layer's param are more "gentle" now.

Jiangfeng-Xiong · 2017-05-09T04:38:59Z

@shaibagon @shelhamer what will happen if we share parameters in batchnorm layer, since mean and variance are calculated based on input, so, during training, there are two inputs in the siamese network,there would be two means and two variance based on different inputs, So, what will be used as paramter in batchnorm, or we just average them?
Thanks

shaibagon · 2017-05-09T05:07:46Z

@Jiangfeng-Xiong you obviously cannot have two means and variances in the same layer, it make no sense.
The idea behind a Siamese network is that you actually train a single net, this is why you share the weights between the two copies. Thus, the batch norm parameters are averaged between the two copies as are all the weights in the net.

fixing upgrade_proto for BatchNorm layer: be more conservative leave …

a19357a

…"name" in param, only set lr_mult and decay_mult to zero

shelhamer self-assigned this Jan 17, 2017

shelhamer mentioned this pull request Jan 19, 2017

Cannot explicitly "name" BatchNorm parameters for sharing (Siamese network) #5171

Closed

shelhamer merged commit bc0d680 into BVLC:master Jan 20, 2017

shaibagon deleted the fix_batch_norm_param_upgrade branch April 18, 2017 09:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing harsh upgrade_proto for `"BatchNorm"` layer #5184

Fixing harsh upgrade_proto for `"BatchNorm"` layer #5184

shaibagon commented Jan 15, 2017

shaibagon commented Jan 16, 2017

shelhamer commented Jan 20, 2017 •

edited

Loading

shaibagon commented Jan 20, 2017

antran89 commented Feb 2, 2017

shaibagon commented Feb 2, 2017

Jiangfeng-Xiong commented May 9, 2017 •

edited

Loading

shaibagon commented May 9, 2017

Fixing harsh upgrade_proto for "BatchNorm" layer #5184

Fixing harsh upgrade_proto for "BatchNorm" layer #5184

Conversation

shaibagon commented Jan 15, 2017

shaibagon commented Jan 16, 2017

shelhamer commented Jan 20, 2017 • edited Loading

shaibagon commented Jan 20, 2017

antran89 commented Feb 2, 2017

shaibagon commented Feb 2, 2017

Jiangfeng-Xiong commented May 9, 2017 • edited Loading

shaibagon commented May 9, 2017

Fixing harsh upgrade_proto for `"BatchNorm"` layer #5184

Fixing harsh upgrade_proto for `"BatchNorm"` layer #5184

shelhamer commented Jan 20, 2017 •

edited

Loading

Jiangfeng-Xiong commented May 9, 2017 •

edited

Loading