Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSRA weight filler #1946

Closed
wants to merge 1 commit into from
Closed

Conversation

nickcarlevaris
Copy link

This PR adds MSRAFiller, which implements an Xavier-like filler designed for use with ReLUs instead of tanh, based on the paper: He et al, "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification," 2015.

It also adds a VarianceNorm option to FillerParameters which allows one to normalize by fan_in, fan_out or their average. VarianceNorm applies to the MSRAFiller and the XavierFiller (default behavior unchanged). It also adds tests for MSRAFiller and XavierFiller.

Replaces #1883 (updates based on that discussion and rebased against master).

Like the XavierFiller, the fan_in and fan_out dimensions are not correct for inner product layers (as pointed out by @seanbell in #1883). However, I did update the documentation to note this.

… use

with ReLUs instead of tanh. Based on paper: He et al, "Delving Deep into
Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,"
2015. Added VarianceNorm option to FillerParameters which allows one to
normalize by fan_in, fan_out or their average. Updated XavierFiller to use the
VarianceNorm option (default behavior unchanged). Added tests for MSRAFiller and
XavierFiller.
@seanbell
Copy link

Note that #1970 should fix the fan_in and fan_out calculations for InnerProductLayer since the weights will now be 2D with shape output x input.

* scale] where scale = sqrt(3 / n) where n is the fan_in, fan_out, or their
* average, depending on the variance_norm option. You should make sure the
* input blob has shape (num, a, b, c) where a * b * c = fan_in and num * b * c
* = fan_out. Note that this is currently not the case for inner product layers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1970 is in so this filler is now right for InnerProduct layers too.

@shelhamer
Copy link
Member

@nickcarlevaris thanks -- this looks good. The only potential issue is naming and attribution. I am not certain but if I understand correctly the same sqrt(2) gain may have been suggested by Andrew Saxe et al. through derivations in http://arxiv.org/abs/1312.6120v3. Although I suggested "MSRA" earlier, I think a citation to both and a functional name is perhaps best.

@nickcarlevaris you suggested "ReLU" since this is intended for use with the so-named nonlinearity. It could be this is the right choice.

@longjon ?

@futurely
Copy link

futurely commented Apr 9, 2015

#1940 has been merged for a month. Can these two work together to reproduce the paper's results?

@omgteam
Copy link

omgteam commented May 21, 2015

This issue has been open for a long time. Hope it merged quickly.

@omgteam
Copy link

omgteam commented May 23, 2015

Why hasn't this been merged into master? anything wrong?

shelhamer added a commit that referenced this pull request May 27, 2015
  Add MSRAFiller, an Xavier-like filler designed for use with ReLUs
@shelhamer
Copy link
Member

Merged to master in c255709. Thanks @nickcarlevaris!

I did a manual merge to re-format the commit message and add my own commit to note potentially related work. Closing since my edit threw off the github merge.

@shelhamer shelhamer closed this May 27, 2015
@happynear
Copy link

Why there is no parameter to specify the \alpha defined in Equation 15?
Since PReLU layer has been added to Caffe, I think we should also introduce this parameter into the filler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants