Improve / Fix Weight Sharing #1211

shelhamer · 2014-10-02T22:22:58Z

Weight sharing as-is relies on a weight owner with which shared layers share their parameter blobs. This poses a few problems in relation to loss, loading and saving parameters, and weight initialization that are listed here for addressing.

Fix incorrect momentum and history due to separation of shared weights Fix weight sharing #2866
Fix the resuming / fine-tuning issue for shared weights; see Contrastive loss layer for training siamese nets #959 (comment). Done in On-the-fly net resizing, without reallocation (where possible) #594 as it turns out.
Determine if there is actually a loss / weight ownership issue as asked at https://github.com/BVLC/caffe/pull/546/files#r16817721 by @ashafaei. [No, there is not –shelhamer]
Save memory through accumulation Decouple the computational batch size and minibatch size by accumulating gradients #1977 by sharing diffs Fix weight sharing #2866
Load and save only the owned weights and not shared duplicates Snapshot model weights/solver state to HDF5 files #2836 for hdf5
Figure out how snapshot / restore should resolve by layer or param name and fallback as needed
Only the owner should initialize weights. Currently unnecessary work and memory is expended filling all weights, and then these are discarded to share with the weight owners.
Die if weight fillers are defined in layers that don't own their parameters (the weights are properly initialized in this case, but only by ignoring the incorrect specification as written).

@jeffdonahue @longjon

jeffdonahue · 2014-10-03T00:22:51Z

Fix the resuming / fine-tuning issue for shared weights; see #959 (comment). Done in #594 as it turns out.

I just pushed a unit test for resuming from saved weights (4dc5bd0). It passes as expected, but fails when cherry-picked from 8dac339, before #594 was merged. Glad this was magically fixed, thanks @longjon!

ducha-aiki · 2014-10-03T14:16:23Z

Would you consider the tied weights also? i have tried to implement them by myself, but with current weight sharing scheme it seemed too complicated.

rodrigob · 2014-10-06T11:09:58Z

@ducha-aiki what is the difference between tied weights and shared weights ?

@shelhamer I can look into dying if fillers are defined where parameters are shared; if you tell me what is the "caffe way of dying" (LOG(FATAL) and then ?).
Also, as example, for InnerProductLayer, can you share bias without sharing the product weights ?

ducha-aiki · 2014-10-06T11:14:47Z

@rodrigob Tied weights are used in autoencoders. If encoder weights = W, then decoder weights = W^T, i.e transposed ones.
https://groups.google.com/forum/#!topic/theano-users/QilEmkFvDoE

shelhamer · 2014-10-06T15:12:06Z

@ducha-aiki @rodrigob autoencoder-style shared weights are already possible by Caffe weight sharing if the blobs are shared with PERMISSIVE dimensionality checking:https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L273-L281 and the transpose shape is defined in the deconv layers.

While blobs can be shared permissively so that they have the same total cardinality but different dimensions this doesn't cover everything for W, W^T pairs since the input-output swapped inner product weights aren't the transpose.

ducha-aiki · 2014-10-06T15:34:54Z

@shelhamer but the weights have different order in transposed matrix. I will check again, but when I have tried, that did not worked.

jeffdonahue · 2014-10-06T15:54:47Z

Yeah, it would not work for pairs of inner product layers where the weights are transposed (using permissive would probably give very bad results). It would require a little bit of additional implementation -- probably the easiest would be to add a "transposed weights" option to the inner product layer so that the layer pair could use the same weight matrix.

ducha-aiki · 2014-10-06T17:06:16Z

@jeffdonahue This is easy. The real problem are diffs, since they have not only different shape, but number of elements.

jeffdonahue · 2014-10-06T17:22:04Z

What? Why would the diffs be a different number of elements? I think I'm missing something...

ducha-aiki · 2014-10-06T17:26:44Z

@jeffdonahue Because size of diff == size of output.
An example from MNIST autoencoder:
name: "MNISTAutoencoder"
input: "data"
input_dim: 1
input_dim: 1
input_dim: 28
input_dim: 28
layers {
bottom: "data"
top: "encode1"
name: "encode1"
type: INNER_PRODUCT
inner_product_param {
num_output: 1000
}
}
layers {
bottom: "encode1"
top: "decode1"
name: "decode1"
type: INNER_PRODUCT
inner_product_param {
num_output: 784
}
}

jeffdonahue · 2014-10-06T17:30:36Z

Right, the encode1 weights are 1000x784 (producing 1000D outputs from 784D inputs) and the decode1 weights have the transposed dimension, 784x1000 (producing 784D outputs from 1000D inputs). The weight gradients are the same dimension by definition.

shelhamer · 2015-01-15T08:02:58Z

We should keep #1659 in mind too.

yosipk · 2015-03-18T09:08:03Z

Mocha has TiedInnerProductLayer [http://mochajl.readthedocs.org/en/latest/user-guide/layers/computation-layer.html#TiedInnerProductLayer, source: https://github.com/pluskid/Mocha.jl/blob/master/src/layers/tied-inner-product.jl], I guess Caffe could be similar, along the lines of @jeffdonahue suggestion to add a "transposed weights" option to the inner product layer.

raingo · 2015-06-13T21:08:24Z

Do we have an update on these?

Shared weights are very important for recurrent nets.

Jim61C · 2017-04-28T07:01:59Z

Hi, Do we have an update on the 7th problem mentioned above?

"Only the owner should initialize weights. Currently unnecessary work and memory is expended filling all weights, and then these are discarded to share with the weight owners."

I am currently facing a problem having multiple FC layers sharing weights due to memory issue and I believe that it is due to the fact that even if I share weights between those FC layers, they are still being initialized and take extra memory at the creation of the network, any idea on workaround of this will be greatly appreciated!

Thanks!

shelhamer added enhancement bug labels Oct 2, 2014

shelhamer mentioned this issue Oct 6, 2014

Transposed weights sharing == tied weights for AE #670

Closed

shelhamer mentioned this issue Apr 27, 2015

Inefficient Snapshotting of Shared Parameters #2375

Closed

Macbull mentioned this issue Jun 6, 2015

Least Square Solution Layer (for ELM) #2565

Closed

This was referenced Jul 30, 2015

Snapshot model weights/solver state to HDF5 files #2836

Merged

Fix weight sharing #2866

Merged

shelhamer added the ready for review label Aug 6, 2015

kashefy mentioned this issue Jan 29, 2016

Tied weights with transpose flag for InnerProduct layer #3612

Merged

shelhamer added focus and removed ready for review labels Apr 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve / Fix Weight Sharing #1211

Improve / Fix Weight Sharing #1211

shelhamer commented Oct 2, 2014 •

edited

Loading

jeffdonahue commented Oct 3, 2014

ducha-aiki commented Oct 3, 2014

rodrigob commented Oct 6, 2014

ducha-aiki commented Oct 6, 2014

shelhamer commented Oct 6, 2014

ducha-aiki commented Oct 6, 2014

jeffdonahue commented Oct 6, 2014

ducha-aiki commented Oct 6, 2014

jeffdonahue commented Oct 6, 2014

ducha-aiki commented Oct 6, 2014

jeffdonahue commented Oct 6, 2014

shelhamer commented Jan 15, 2015

yosipk commented Mar 18, 2015

raingo commented Jun 13, 2015

Jim61C commented Apr 28, 2017

Improve / Fix Weight Sharing #1211

Improve / Fix Weight Sharing #1211

Comments

shelhamer commented Oct 2, 2014 • edited Loading

jeffdonahue commented Oct 3, 2014

ducha-aiki commented Oct 3, 2014

rodrigob commented Oct 6, 2014

ducha-aiki commented Oct 6, 2014

shelhamer commented Oct 6, 2014

ducha-aiki commented Oct 6, 2014

jeffdonahue commented Oct 6, 2014

ducha-aiki commented Oct 6, 2014

jeffdonahue commented Oct 6, 2014

ducha-aiki commented Oct 6, 2014

jeffdonahue commented Oct 6, 2014

shelhamer commented Jan 15, 2015

yosipk commented Mar 18, 2015

raingo commented Jun 13, 2015

Jim61C commented Apr 28, 2017

shelhamer commented Oct 2, 2014 •

edited

Loading