Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial re-initialisation of conv layer for finetuning #924

Closed
HoldenCaulfieldRye opened this issue Aug 14, 2014 · 1 comment
Closed

Partial re-initialisation of conv layer for finetuning #924

HoldenCaulfieldRye opened this issue Aug 14, 2014 · 1 comment
Labels

Comments

@HoldenCaulfieldRye
Copy link

For finetuning, is there a way of copying a conv layer and re-initialising only a subset of its kernel maps?

Justification: when doing transfer learning aka finetuning, it can be useful to freeze backprop (on lower layers) if the source training set is small, otherwise good filters can degenerate and be lost as the net overfits.
On the other hand, sometimes the network lacks expressive power, and it helps to enable backprop and randomly (re-)initialise the weights.

So what if the target task requires a few significantly different (high level) filters to the ones learned on the source task? Would be nice to re-initialise just a few kernel maps.

@shelhamer
Copy link
Member

You can edit model parameters to your heart's content. Selectively replacing some convolutional filter kernels with random initialization is certainly possible. See the editing model parameters example on the Caffe site. Note that in the Python interface all Caffe model parameters are mutable: you can alter them and save the new model. Once the new weights are saved, you can finetune from it.

Please continue the discussion and ask future usage questions on the caffe-users mailing list. As of the latest release we prefer to keep issues reserved for Caffe development. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants