Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Siamese Networks / Distance Learning / Transfer Learning #697

Closed
wants to merge 2 commits into from

Conversation

zayd
Copy link

@zayd zayd commented Jul 15, 2014

Hi,

I am working on implementing a siamese network in caffe. The general pipeline for training I am thinking of right now is:

  1. train a network on a N-way classification task
    • remove the top 1 or 2 layers from the network
    • create a copy of the network (using sharedweights)
    • add a binary output distance layer that measures the distance between the two outputs

(2a) is where I believe the first change in caffe needs to be made. That is, using the representation learned by one deep network for some other task by changing the top 1 or 2 layers.

I put together a small hack that allows this in caffe by loading a state file of a trained network and passing an optional int remove_from_top to the function Solver::Restore and Net::CopyTrainedLayersFrom. This changes the behavior to only load the state of the first (total - remove_from_top) layers of the network. The rest of the layers specified in the new network's .prototxt file should initialize normally (because they are initialized before loading from state)

Do you have any suggestions or another preferred approach on how to tackle to this?

@shelhamer
Copy link
Member

Hey Zayd, nice to see you on the repo!

Caffe actually already understands how to do the initialization you're after for doing siamese networks. The documentation on finetuning is sadly lacking–we're working on a tutorial example–but Caffe loads weights and layers by resolving parameter names from the prototxt definition (model file) in the saved binary proto weights (pretrained weights file).

The steps will look like

  1. define and train a network for N-way classification task
  2. define the siamese network, which is a duplicate of the net from (1) except with whatever layers you do not want omitted, and a distance loss layer on top of siamese nets for the two inputs.
  3. train the siamese network by calling finetune_net.bin on the definition from (2) and the weights from (1). the layers that have been carried over will be copied, and new layers will be initialized as you defined, and and any layers with the same "param" field name will share weights.

If you could document your work in this, at least for an elementary version of a siamese network, it would be an excellent example to include in Caffe! (I know there is interest from previous questions.)

So, long story short, siamese networks do not need a code change to Caffe. Please follow-up if I have missed anything, or if your change provides some useful convenience to steps 1-3 I described.

@zayd
Copy link
Author

zayd commented Jul 15, 2014

Hi Evan, thanks for the response! I will take a look at finetune_net.

On a related note, my understanding is that it is not possible to specify two input sources for a network with the existing framework. So for a siamese network, it wouldn't be possible to have two separate input layers (one that loads a.jpg and another that loads b.jpg). Is this correct? Would you suggest creating a layer (like image_data_layer) that loads up a pair of images?

…ed and replaced with different layers. Functionality already exists in Caffe
@sguada
Copy link
Contributor

sguada commented Jul 15, 2014

Actually yes you can have 2 or more image_data_layers, I have used for
other tasks and works well. Although you may want to have an specific way
of pairing images.

Sergio

2014-07-15 11:02 GMT-07:00 S. Zayd Enam [email protected]:

Hi Evan, thanks for the response! I will take a look at finetine_net.

On a related note, my understanding is that it is not possible to specify
two input sources for a network with the existing framework. So for a
siamese network, it wouldn't be possible to have two separate input layers
(one that loads a.jpg and another that loads b.jpg). Is this correct? Would
you suggest creating a layer (like image_data_layer) that loads up a pair
of images?


Reply to this email directly or view it on GitHub
#697 (comment).

@shelhamer
Copy link
Member

To follow up in generality, Caffe understands arbitrary DAG models. You can have multiple inputs, different outputs, forking paths, and whatever.

@shelhamer shelhamer changed the title Using the representation learned by one task for another task (siamese network) Siamese Networks / Distance Learning / Transfer Learning Jul 18, 2014
@shelhamer
Copy link
Member

Closing; this was a good question but not a PR.

A siamese network example in Caffe once you're done would be a nice PR!

@shelhamer shelhamer closed this Jul 30, 2014
@wendlerc
Copy link

wendlerc commented Aug 1, 2014

@shelhamer how would you generate the leveldb when you e.g. want to assign one label to a pair of input images.

@ashafaei
Copy link
Contributor

ashafaei commented Aug 9, 2014

@shelhamer, Even though we have parameter sharing (#546) and Eltwise operations, from what I understand there are still couple of blocks missing. We need at least an Abs() operation, and if we wish to follow [1] we also should define a new LossLayer. Although DeepFace suggests using cross entropy loss after a layer that takes linear combination of absolute differences (Similar to #639)

Could you also verify this and let me know whether these are the remaining pieces to be added? If that's the case I'm willing to roll up my sleeves to finish it and prepare an example.

[1] S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 539–546. IEEE, 2005. http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf

@cheer37
Copy link

cheer37 commented Mar 11, 2016

@shelhamer
I am investigating the siamese network and saw the siamese example for mnist.
i wonder how two nets share weights there.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants