Scale and Bias Layers #3591

jeffdonahue · 2016-01-23T01:25:02Z

This PR combines @ducha-aiki's ChannelwiseAffineLayer (#2996) with my Scalar (#3021) and Bias (#3550) for appropriate credit and should have all the advantages of each. (After some discussion we decided to name the scaling part Scale for simplicity.) ScaleLayer alone can now replace ChannelwiseAffineLayer by setting scale_param { bias_term: true } with a combined GPU kernel to both scale and add to the input, which should give the performance advantage that @ducha-aiki measured as part of the discussion in #3229, while still allowing for the modularity of separating the two when desired. Both ScaleLayer and BiasLayer can take a single bottom to learn the scale/bias as a parameter, or two bottoms so the scale/bias can be taken as an input*. The dimensions of the scale/bias blob may be any subsequence of the dimensions in the first bottom. The operation can be thought of as a (virtual) reshaping+tiling to the shape of the first Blob, followed by element-wise addition/multiplication. The operations could alternatively be performed by composing Reshape, Tile, and Eltwise layers, but in any case except where EltwiseLayer alone suffices, this would be less efficient in terms of memory and performance, often substantially so.

@ducha-aiki hopefully this is the best of both worlds in terms of performance, generality, and modularity -- let me know if you have any feedback though. Otherwise we will try to get this reviewed and merged soon.

*I'm happy to see this excess logic simplified/removed if/when @longjon's param_bottom and ParameterLayer work is merged, but for now this is the best way I could think of to address many different use cases for the layers.

and ScaleLayer. The behavior of ChannelwiseAffineLayer can be reproduced by a ScaleLayer with `scale_param { bias_term: true }`. BiasLayer and ScaleLayer each take 1 or 2 bottoms, with the output having the same shape as the first. The second input -- either another bottom or a learned parameter -- will have its axes (virtually) broadcast and tiled to have the same shape as the first, after which elementwise addition (Bias) or multiplication (Scale) is performed.

ducha-aiki · 2016-01-23T12:06:12Z

@jeffdonahue LGTM.

shelhamer · 2016-01-27T01:43:10Z

Thanks @ducha-aiki and @jeffdonahue for the scale + bias layers!

Scale and Bias Layers

siddharthm83 · 2016-01-27T05:30:56Z

cool!

ducha-aiki · 2016-01-27T06:51:46Z

@shelhamer nice!

lfrdm · 2016-01-29T02:06:19Z

Hi guys. Thanks a lot for the great work with the batch normalization layer. To understand the correct implementation accordingly the paper for the train_val.prototxt: First one has to compute the normalized batch with the layer type batchNorm (after my ReLUs) and then use the scaleLayer with scale_param{ bias_term: true } and the biasLayer to learn the scale and bias of my normalized batch?

jeffdonahue · 2016-01-29T02:20:04Z

Batch norm is, in the original paper and in typical use, placed before the activation (ReLU or otherwise), not after. You should use ScaleLayer with bias_term: true (should give the best performance), or separately use ScaleLayer (with bias_term: false, the default) followed by BiasLayer. A ScaleLayer with bias_term followed by a BiasLayer would wastefully learn an extra bias (and effectively double its learning rate).

lfrdm · 2016-01-29T10:07:40Z

Thanks @jeffdonahue, for the answer and explanation. I'm curious though, how the normalization is handled while testing? In the paper introduction they refer to only use the normalization on the training batches. Do I have to use different use_global_stats for the training and testing phase or is it handled internally in the batch_norm_layer?

cuihenggang · 2016-04-01T19:29:20Z

Does anyone have the new train_val file for the Inception-BN network with the ScaleBias layers added (for the ILSVRC12 dataset)?

Add ChannelwiseAffine for batch norm

ec04197

jeffdonahue force-pushed the scale-bias-layer branch 2 times, most recently from 8a67137 to 0d9ca49 Compare January 23, 2016 01:35

This was referenced Jan 23, 2016

Add ScalarLayer to multiply two Blobs with broadcasting #3021

Closed

Add BiasLayer to add two Blobs with broadcasting #3550

Closed

jeffdonahue force-pushed the scale-bias-layer branch from 0d9ca49 to 0816907 Compare January 23, 2016 05:00

shelhamer added a commit that referenced this pull request Jan 27, 2016

Merge pull request #3591 from jeffdonahue/scale-bias-layer

dc831aa

Scale and Bias Layers

shelhamer merged commit dc831aa into BVLC:master Jan 27, 2016

jeffdonahue mentioned this pull request Jan 27, 2016

Added layer for learnable eltwise y=kx+b #2996

Closed

jeffdonahue deleted the scale-bias-layer branch January 27, 2016 06:01

lukeyeager mentioned this pull request Feb 9, 2016

running on multiple GPU is very slow NVIDIA/DIGITS#572

Closed

ducha-aiki mentioned this pull request Mar 16, 2016

Where is ScaleBias layers? smichalowski/google_inception_v3_for_caffe#2

Closed

revilokeb mentioned this pull request May 12, 2016

questions about the test accuracy smichalowski/google_inception_v3_for_caffe#1

Closed

jay-mahadeokar mentioned this pull request Aug 12, 2016

scale layer in Resnet50 jay-mahadeokar/pynetbuilder#3

Open

bwilbertz mentioned this pull request Aug 18, 2016

fix layerSetUp of scale_layer to not add bias blob when already present #4600

Merged

williford mentioned this pull request Aug 25, 2016

Implement Google Batch Normalization to beat human on ImageNet classification task #1990

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale and Bias Layers #3591

Scale and Bias Layers #3591

jeffdonahue commented Jan 23, 2016

ducha-aiki commented Jan 23, 2016

shelhamer commented Jan 27, 2016

siddharthm83 commented Jan 27, 2016

ducha-aiki commented Jan 27, 2016

lfrdm commented Jan 29, 2016

jeffdonahue commented Jan 29, 2016

lfrdm commented Jan 29, 2016

cuihenggang commented Apr 1, 2016

Scale and Bias Layers #3591

Scale and Bias Layers #3591

Conversation

jeffdonahue commented Jan 23, 2016

ducha-aiki commented Jan 23, 2016

shelhamer commented Jan 27, 2016

siddharthm83 commented Jan 27, 2016

ducha-aiki commented Jan 27, 2016

lfrdm commented Jan 29, 2016

jeffdonahue commented Jan 29, 2016

lfrdm commented Jan 29, 2016

cuihenggang commented Apr 1, 2016