-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scale and Bias Layers #3591
Scale and Bias Layers #3591
Conversation
8a67137
to
0d9ca49
Compare
and ScaleLayer. The behavior of ChannelwiseAffineLayer can be reproduced by a ScaleLayer with `scale_param { bias_term: true }`. BiasLayer and ScaleLayer each take 1 or 2 bottoms, with the output having the same shape as the first. The second input -- either another bottom or a learned parameter -- will have its axes (virtually) broadcast and tiled to have the same shape as the first, after which elementwise addition (Bias) or multiplication (Scale) is performed.
0d9ca49
to
0816907
Compare
@jeffdonahue LGTM. |
Thanks @ducha-aiki and @jeffdonahue for the scale + bias layers! |
cool! |
@shelhamer nice! |
Hi guys. Thanks a lot for the great work with the batch normalization layer. To understand the correct implementation accordingly the paper for the train_val.prototxt: First one has to compute the normalized batch with the layer type batchNorm (after my ReLUs) and then use the scaleLayer with scale_param{ bias_term: true } and the biasLayer to learn the scale and bias of my normalized batch? |
Batch norm is, in the original paper and in typical use, placed before the activation (ReLU or otherwise), not after. You should use ScaleLayer with |
Thanks @jeffdonahue, for the answer and explanation. I'm curious though, how the normalization is handled while testing? In the paper introduction they refer to only use the normalization on the training batches. Do I have to use different use_global_stats for the training and testing phase or is it handled internally in the batch_norm_layer? |
Does anyone have the new train_val file for the Inception-BN network with the ScaleBias layers added (for the ILSVRC12 dataset)? |
This PR combines @ducha-aiki's
ChannelwiseAffineLayer
(#2996) with my Scalar (#3021) and Bias (#3550) for appropriate credit and should have all the advantages of each. (After some discussion we decided to name the scaling partScale
for simplicity.)ScaleLayer
alone can now replaceChannelwiseAffineLayer
by settingscale_param { bias_term: true }
with a combined GPU kernel to both scale and add to the input, which should give the performance advantage that @ducha-aiki measured as part of the discussion in #3229, while still allowing for the modularity of separating the two when desired. BothScaleLayer
andBiasLayer
can take a single bottom to learn the scale/bias as a parameter, or two bottoms so the scale/bias can be taken as an input*. The dimensions of the scale/bias blob may be any subsequence of the dimensions in the first bottom. The operation can be thought of as a (virtual) reshaping+tiling to the shape of the first Blob, followed by element-wise addition/multiplication. The operations could alternatively be performed by composingReshape
,Tile
, andEltwise
layers, but in any case except whereEltwiseLayer
alone suffices, this would be less efficient in terms of memory and performance, often substantially so.@ducha-aiki hopefully this is the best of both worlds in terms of performance, generality, and modularity -- let me know if you have any feedback though. Otherwise we will try to get this reviewed and merged soon.
*I'm happy to see this excess logic simplified/removed if/when @longjon's
param_bottom
andParameterLayer
work is merged, but for now this is the best way I could think of to address many different use cases for the layers.