Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing harsh upgrade_proto for "BatchNorm" layer #5184

Merged
merged 1 commit into from
Jan 20, 2017

Conversation

shaibagon
Copy link
Member

This PR attempts to fix issues #5171 and #5120 cuased by PR #4704:
PR#4704 removes completely all param arguments of "BatchNorm" layers, and resetting them to param {lr_mult: 0}. This "upgrade" is too harsh and it discards "name" argument that might be set by user.

This PR fixes upgrade_proto.cpp for "BatchNorm" layer to be more conservative, leave "name" in param, and only set lr_mult and decay_mult to zero.

Example of such upgrade:
Input prototxt

layer {
  type: "BatchNorm"
  name: "bn0"
  bottom: "data"
  top: "bn0"
  # old style params
  param: { lr_mult: 0 }
  param: { lr_mult: 0 }
  param: { lr_mult: 0 }
}
layer {
  type: "BatchNorm"
  name: "bn1"
  bottom: "bn0"
  top: "bn1"
  # wrong params
  param: { lr_mult: 1 decay_mult: 1}
  param: { lr_mult: 1 decay_mult: 0}
  param: { lr_mult: 1 decay_mult: 1}
}
layer {
  type: "BatchNorm"
  name: "bn2"
  bottom: "bn1"
  top: "bn2"
  # no params at all
}
layer {
  type: "BatchNorm"
  name: "bn3"
  bottom: "bn2"
  top: "bn3"
  # wrong with "name"
  param: { lr_mult: 1 decay_mult: 1 name: "bn_m"}
  param: { lr_mult: 1 decay_mult: 1 name: "bn_s"}
  param: { lr_mult: 1 decay_mult: 1 name: "bn_b"}
}
layer {
  type: "BatchNorm"
  name: "bn4"
  bottom: "bn3"
  top: "bn4"
  # only "name"
  param: { name: "bn_m"}
  param: { name: "bn_s"}
  param: { name: "bn_b"}
}

"Upgraded" prorotxt:

layer {
  name: "bn0"
  type: "BatchNorm"
  bottom: "data"
  top: "bn0"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
}
layer {
  name: "bn1"
  type: "BatchNorm"
  bottom: "bn0"
  top: "bn1"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
}
layer {
  name: "bn2"
  type: "BatchNorm"
  bottom: "bn1"
  top: "bn2"
}
layer {
  name: "bn3"
  type: "BatchNorm"
  bottom: "bn2"
  top: "bn3"
  param {
    name: "bn_m"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_s"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_b"
    lr_mult: 0
    decay_mult: 0
  }
}
layer {
  name: "bn4"
  type: "BatchNorm"
  bottom: "bn3"
  top: "bn4"
  param {
    name: "bn_m"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_s"
    lr_mult: 0
    decay_mult: 0
  }
  param {
    name: "bn_b"
    lr_mult: 0
    decay_mult: 0
  }
}

As you can see lr_mult and decay_mult are set to zero leaving name intact when explicitly set by user.

…"name" in param, only set lr_mult and decay_mult to zero
@shaibagon
Copy link
Member Author

@shelhamer would you please have a look at this issue/proposed fix?

Thanks.

@shelhamer
Copy link
Member

shelhamer commented Jan 20, 2017

Switching to zeroing the lr_mult and decay_mult like this is fine. I was too focused on avoiding incorrect statistics gradients that I made sharing impossible. Thanks for the fix!

@shelhamer shelhamer merged commit bc0d680 into BVLC:master Jan 20, 2017
@shaibagon
Copy link
Member Author

@shelhamer Thanks for merging This PR!

@antran89
Copy link
Contributor

antran89 commented Feb 2, 2017

@shaibagon Thank Shai for a fix. Not sure about internal structure. Just quick question. Does the upgraded proto of BN layer have the same interface as before having this upgrade?

@shaibagon
Copy link
Member Author

@antran89 There is no interface change. The actions upgrade_proto takes when encountering "BatchNorm" layer's param are more "gentle" now.

@shaibagon shaibagon deleted the fix_batch_norm_param_upgrade branch April 18, 2017 09:56
@Jiangfeng-Xiong
Copy link

Jiangfeng-Xiong commented May 9, 2017

@shaibagon @shelhamer what will happen if we share parameters in batchnorm layer, since mean and variance are calculated based on input, so, during training, there are two inputs in the siamese network,there would be two means and two variance based on different inputs, So, what will be used as paramter in batchnorm, or we just average them?
Thanks

@shaibagon
Copy link
Member Author

@Jiangfeng-Xiong you obviously cannot have two means and variances in the same layer, it make no sense.
The idea behind a Siamese network is that you actually train a single net, this is why you share the weights between the two copies. Thus, the batch norm parameters are averaged between the two copies as are all the weights in the net.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants