Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tied weights with transpose flag for InnerProduct layer #3612

Merged
merged 1 commit into from
Feb 25, 2016

Conversation

kashefy
Copy link
Contributor

@kashefy kashefy commented Jan 29, 2016

I wanted to train an autoencoder where the deocder uses the tranpose of the encoder's weight matrix. This is first discussed in #670 and followed up in #1211 (comment). But it seemed this wasn't resolved. I found @jeffdonahue 's suggestion in this comment to just add a transpose flag to the InnerProduct layer quite reasonable.

This PR adds a transpose flag to the InnerProduct layer as well as its params protobuf message.
When set to true for the deocder, in the forward pass, the call to the matrix multiplication routine is instructed to NOT transpose the weight matrix. Which is what you want in the usual case and for the encoder.

Tying the weights between encoder and decoder requires:

  1. sharing the encoder's weight params with the inner product layer that is the decoder.
  2. setting the share_mode to true in both ip layers (otherwise it won't allow for the shape mismatch)
  3. Adding 'transpose: true' to the decoder's inner_product_param

A sample trainval.prototxt to demonstrate usage.

I haven't written unit tests around this yet. Open to suggestions to what makes sense to test for here.

Thanks for reviewing and looking forward to the feedback.

bottom_data, weight, (Dtype)0., top_data);
caffe_gpu_gemm<Dtype>(CblasNoTrans, transpose_ ? CblasNoTrans : CblasTrans,
M_, N_, K_, (Dtype)1.,
bottom_data, weight, (Dtype)0., top_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove added indent

@jeffdonahue
Copy link
Contributor

Thanks @kashefy! This looks pretty good to me.

setting the share_mode to true in both ip layers (otherwise it won't allow for the shape mismatch)

This shouldn't be needed -- instead the weight param should be set to the correct shape by swapping N_ & K_, changing lines 32-33 of inner_product_layer.cpp to be conditioned on transpose.

Besides that, please see the style nitpicks and squash your history to a single commit.

Re testing: it would be good to have a few unit tests:

  • verify the correct shape of the parameter with and without transpose set
  • a gradient check with transpose set
  • a forward check, for example: initialize an IP layer without transpose and the parameter randomly initialized, run Forward, save the result; initialize another IP layer with transpose, manually copy and transpose the value of the parameter from the first IP layer, then run Forward on the same input and check that the result is the same

@kashefy
Copy link
Contributor Author

kashefy commented Jan 29, 2016

@jeffdonahue, thanks for the feedback. Will fix styling (travis build failed because of it) and add the unit tests.

@jeffdonahue
Copy link
Contributor

Great, thanks. Also just noticed you didn't change backward -- pretty sure that will need a different CBlasTrans setting as well. (But no need to think about it once you write the gradient check :)

@kashefy
Copy link
Contributor Author

kashefy commented Feb 3, 2016

quick update:
Wrote unit tests around forward, backward, blob shape (with and without transpose). Tests pass. Fixed styling.
Transposing shared weights needs more work. The transposing ip stores the weight shape in the post-transpose form but the transposing doesn't happen until the multiplication is called. However setting the num_outputs of the decoder collides with these assumptions.
Maybe I need a second flag next to the transpose one to hold off on switching the weight shapes in case the weights are tied with another 'encoder' layer.

@@ -148,4 +265,127 @@ TYPED_TEST(InnerProductLayerTest, TestGradient) {
}
}

TYPED_TEST(InnerProductLayerTest, TestGradientTransposeFalse) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this test be TestGradientTransposeTrue (with the corresponding change from set_transpose(false) to set_transpose(true))? This test is effectively a duplicate of the existing test above (TestGradient), I'd think.

And if this test is changed to be done with transpose on, is there still anything additionally tested by TestBackwardTranspose? I would think the combination of TestForward with the gradient check would cover all functionality.

@jeffdonahue
Copy link
Contributor

@kashefy the tests and code look good but see comments/nitpicks above. Once you've addressed these, please squash your history to a single commit (or, if you prefer, two commits -- one for the style fixes of existing code, and another for your added feature and tests), and I can merge this. Thanks!

@kashefy
Copy link
Contributor Author

kashefy commented Feb 3, 2016

@jeffdonahue thanks for the feedback. Will go over the redundant test. The PR as it is now only adds a transpose feature to the ip layer. Tiying weights in an auto encoder doesn't work yet. If you think the transpose feature is useful on its own, I can the tying part in another PR.

@jeffdonahue
Copy link
Contributor

I'm not sure I understand why shared weights between an "encoder" and "decoder" layer wouldn't work in the current form. Both the shape and memory layout of the weight matrix would be the same between a normal IP layer (transpose = false) that takes D-dimensional input and produces N-dimensional output, and a transposed IP layer (transpose = true) that takes N-dimensional input and produces D-dimensional output. Given that, I would think the only other thing that should need to be done in the transpose=true case (and which you have done here) is to change the BLAS transpose settings in forward/backward when reading from/writing to the weights.

I could certainly be missing something though.

@kashefy
Copy link
Contributor Author

kashefy commented Feb 3, 2016

Transposing works for tied weights in an autoencoder as well. All good to go.

@kashefy kashefy force-pushed the tied_weights_ip_transpose branch 3 times, most recently from eee4372 to 1954b3b Compare February 5, 2016 11:20
@kashefy
Copy link
Contributor Author

kashefy commented Feb 5, 2016

Failures are due to import errors when running python nose tests. Possible solution in #3638

@kashefy
Copy link
Contributor Author

kashefy commented Feb 8, 2016

Travis job passing now, can't really explain why. But glad it the import errors are gone now. All good to go.

@kashefy
Copy link
Contributor Author

kashefy commented Feb 17, 2016

Hello @jeffdonahue, I think this is ready. The transpose worked for shared weights after all as is.

@jeffdonahue
Copy link
Contributor

@kashefy thanks, looks like this is almost there! But could you add a simple TestGradientTranspose test? It should be exactly the same as the existing TestGradient but of course have one extra line that does set_transpose(true). And with that test added I'm inclined to say TestBackwardTranspose should be removed, unless you think there is something additionally tested in that which isn't covered by the gradient checker.

…toencoder. Arguments to matrix multiplication function are conditioned on this parameter, no actual transposing takes place.

test ip gradient computation with transpose on
@kashefy
Copy link
Contributor Author

kashefy commented Feb 20, 2016

Hello @jeffdonahue, I've added TestGradientTranspose (bascially TestGradient + set transpose to true, as you suggested). You're right, TestBackwardTranspose is somewhat redundant. It's not covering anything that the gradient checker isn't already covering. However, I find it to be helpful in narrowing down where a failure could come from. The test was actually very helpful in setting up the backward computation, so I'm a bit reluctant in throwing it out. I tend to use tests as a debug aid, so I usually end up writing more to better understand where a failure is coming from at the expense of increasing redundancy and length of code.

@kashefy
Copy link
Contributor Author

kashefy commented Feb 25, 2016

Hello @jeffdonahue, do you think the current tests are sufficient? Anything else you think should go into this PR? Thanks.

@jeffdonahue
Copy link
Contributor

@kashefy thanks for adding the gradient check; I suppose it can't hurt much to have the backward test as it's presumably very quick (relative to the full gradient check). LGTM -- thanks for this work.

jeffdonahue added a commit that referenced this pull request Feb 25, 2016
Tied weights with transpose flag for InnerProduct layer
@jeffdonahue jeffdonahue merged commit fe0f441 into BVLC:master Feb 25, 2016
fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016
Tied weights with transpose flag for InnerProduct layer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants