Add BiasLayer to add two Blobs with broadcasting #3550

jeffdonahue · 2016-01-13T20:46:12Z

This adds BiasLayer, designed analogously to ScalarLayer (#3021), to add blobs with arbitrary axes broadcasted. This could be used together with ScalarLayer to learn the batch norm scale and shift parameters. It could also be used independently anywhere in a network to learn a bias without a corresponding multiplication. And even more generally, it could be used to efficiently add two blobs with any number of corresponding axes, which can currently only be accomplished (in the most general case) rather inefficiently: with a pair of Reshapes and Tiles (to broadcast leading and trailing axes) followed by the Eltwise SUM operation.

This is currently based on ScalarLayer (for caffe.proto ID sequencing), with the last two commits being the relevant ones -- I'm happy to rebase this without ScalarLayer if we want to merge this before or without that.

Both this and ScalarLayer can either take two bottoms, specifying both inputs to the function, or take a single bottom and learn the second as a parameter.

A different approach for learning the BN scale/shift parameters that I haven't looked at yet is in #2996 (by @ducha-aiki), which learns both sets of parameters together. @cdoersch and I and anyone else interested (possibly @longjon and @shelhamer) should take a look at both and evaluate the benefits, with merge priority for any shared functionality given to @ducha-aiki's #2996 as the earlier PR.

Personally I do like the approach of having layers do as little as possible, which is why for my own work I've taken the approach of using two independent layers.

second as needed

cdoersch · 2016-01-13T20:55:35Z

@longjon @shelhamer @jeffdonahue the discussion of the various options for channelwise affine operations has been happening in #3229...may be good to have all the discussion in one place.

jeffdonahue · 2016-01-23T01:50:04Z

Replaced by #3591

jeffdonahue added 5 commits January 13, 2016 12:31

Add ScalarLayer to multiply two Blobs, broadcasting the shape of the

d8dcb1d

second as needed

ScalarLayer learns scalar as a parameter if only one bottom given

d60389d

ScalarLayer supports in-place computation

f30ffad

Add BiasLayer to add two blobs with broadcasting

8626dde

BiasLayer Forward GPU kernel

0a4156f

jeffdonahue changed the title ~~Add BiasLayer to multiply two Blobs with broadcasting~~ Add BiasLayer to add two Blobs with broadcasting Jan 13, 2016

jeffdonahue mentioned this pull request Jan 13, 2016

Yet another batch normalization PR #3229

Merged

jeffdonahue added 2 commits January 20, 2016 16:15

ScalarLayer bias_term option

debf245

ScalarLayer with bias single GPU kernel in Forward

fd9f9ba

jeffdonahue mentioned this pull request Jan 23, 2016

Scale and Bias Layers #3591

Merged

jeffdonahue closed this Jan 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BiasLayer to add two Blobs with broadcasting #3550

Add BiasLayer to add two Blobs with broadcasting #3550

jeffdonahue commented Jan 13, 2016

cdoersch commented Jan 13, 2016

jeffdonahue commented Jan 23, 2016

Add BiasLayer to add two Blobs with broadcasting #3550

Add BiasLayer to add two Blobs with broadcasting #3550

Conversation

jeffdonahue commented Jan 13, 2016

cdoersch commented Jan 13, 2016

jeffdonahue commented Jan 23, 2016