Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Check failed: ShapeEquals(proto) shape mismatch (reshape not set) #375

Closed
RadicoLabs opened this issue Oct 17, 2015 · 8 comments
Closed
Labels

Comments

@RadicoLabs
Copy link

Not sure if error, but when specifying the "Pretrained Model" filepath on a custom network, DIGITS (or caffe) throws this error. However, when just specifying the Custom Network itself, the job starts fine. Any ideas as to why that is?

Also, when attempting to specify a pretrained model, upon creating the task, the DIGITS print trace shows this error:

Attempting to upgrade input file specified using deprecated V1LayerParameter: <modelPath>

where modelPath is the path to the pretrained model.

@RadicoLabs
Copy link
Author

Hopefully im not just derping and doing something stupid here

@lukeyeager
Copy link
Member

See #140 (comment)

Check and make sure that Caffe is actually throwing an error and not just mis-reporting a warning as an error. I fixed this in BVLC/caffe#2583, but it hasn't been merged into NVcaffe quite yet.

@RadicoLabs
Copy link
Author

I updated the file found at 140, and rebuilt caffe, reran digits, and did the same as before... Still throwing the same error:

ERROR: Check failed: ShapeEquals(proto) shape mismatch (reshape not set)

Here's a copy of everything the server printed after the create job button was hit:

2015-10-20 03:10:07 [20151020-031006-ca44] [WARNING] Removing layer loss1/top-5 because top_k=5 while there are are only 2 labels in this dataset
2015-10-20 03:10:07 [20151020-031006-ca44] [WARNING] Removing layer loss2/top-5 because top_k=5 while there are are only 2 labels in this dataset
2015-10-20 03:10:07 [20151020-031006-ca44] [WARNING] Removing layer loss3/top-5 because top_k=5 while there are are only 2 labels in this dataset
2015-10-20 03:10:07 [20151020-031006-ca44] [INFO ] Train Caffe Model task started.
2015-10-20 03:10:12 [20151020-031006-ca44] [ERROR] Train Caffe Model: Check failed: ShapeEquals(proto) shape mismatch (reshape not set)
2015-10-20 03:10:12 [20151020-031006-ca44] [ERROR] Train Caffe Model task failed with error code -6

@lukeyeager
Copy link
Member

Things I had to do to get the Princeton GoogLeNet model to run (you mentioned it at #373 (comment)):

  1. Download the files
  2. Upgrade the old prototxt file using Caffe's upgrade tool ($CAFFE_HOME/build/tools/upgrade_net_proto_text)
  3. Remove the data_param.source and data_param.backend fields

Then I ran into the error you're reporting. For laughs, I tried it with the 0.14.0-alpha branch of NVcaffe and I got a much more helpful error message:

fine-tuning-error

Looks like BVLC improved their error reporting in BVLC/caffe#2927 - nice work!

Remaining steps:

  1. Rename cls1_fc2 layer to cls1_fc2_new
  2. Rename cls2_fc2 layer to cls2_fc2_new
  3. Rename cls3_fc layer to cls3_fc_new

Now it's working!

fine-tuning-working

This is why I caution people about fine-tuning with Caffe - it's non-trivial!

fine-tuning-warning

Hopefully the above steps will be a help to you and others who want to try their hand at fine-tuning.

@RadicoLabs
Copy link
Author

Where would one find the caffe upgrade tools, And also, is the Princeton Patch still valid?

The instructions found here suggest you need to install the patch, but when i tried this the other day the scripts would not build... Could i have done something wrong or is the patch outdated?

@lukeyeager
Copy link
Member

Where would one find the caffe upgrade tools

Oh sorry, they're at $CAFFE_HOME/build/tools/upgrade_net_proto_text

is the Princeton Patch still valid?

I'm not sure what the patch does, but it sounds like it just helps with memory management to get you to the point of being able to train with a larger batch size. I wouldn't think that would be absolutely necessary.

If you do decide you need the patch, that's your own adventure. The commit their patch is based off of is almost a year old - BVLC/caffe@e8dee35. The oldest release of NVcaffe that DIGITS still supports is based off of a BVLC/caffe commit from March.

@RadicoLabs
Copy link
Author

Yeah according to Model Zoo Under the section "GoogLeNet GPU implementation from Princeton." they say this:

We implemented GoogLeNet using a single GPU. Our main contribution is an effective way to initialize the network and a trick to overcome the GPU memory constraint by accumulating gradients over two training iterations. 

I guess the only reason i want it, is for faster training... Only have 4Gb of VRAM atm, so trying to make the most out of the system i've got ;)

@lukeyeager
Copy link
Member

Feel free to use their version of Caffe if that helps you, but I don't think you'll be able to use DIGITS to wrap it. You can try to merge this commit to get their version to work with DIGITS, but no promises on that.

I guess the only reason i want it, is for faster training...

I don't know how much their patch helps, but there are some really significant things that have been merged into Caffe since then that help with performance - namely multi-GPU and cuDNN v3.

I'm marking this issue as closed as this discussion is devolving from the original question.

RadicoLabs referenced this issue in jmancewicz/DIGITS Oct 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants