ERROR: Check failed: ShapeEquals(proto) shape mismatch (reshape not set) #375

RadicoLabs · 2015-10-17T16:28:58Z

Not sure if error, but when specifying the "Pretrained Model" filepath on a custom network, DIGITS (or caffe) throws this error. However, when just specifying the Custom Network itself, the job starts fine. Any ideas as to why that is?

Also, when attempting to specify a pretrained model, upon creating the task, the DIGITS print trace shows this error:

Attempting to upgrade input file specified using deprecated V1LayerParameter: <modelPath>

where modelPath is the path to the pretrained model.

The text was updated successfully, but these errors were encountered:

RadicoLabs · 2015-10-17T16:31:48Z

Hopefully im not just derping and doing something stupid here

lukeyeager · 2015-10-19T16:49:00Z

See #140 (comment)

Check and make sure that Caffe is actually throwing an error and not just mis-reporting a warning as an error. I fixed this in BVLC/caffe#2583, but it hasn't been merged into NVcaffe quite yet.

RadicoLabs · 2015-10-19T17:14:25Z

I updated the file found at 140, and rebuilt caffe, reran digits, and did the same as before... Still throwing the same error:

ERROR: Check failed: ShapeEquals(proto) shape mismatch (reshape not set)

Here's a copy of everything the server printed after the create job button was hit:

2015-10-20 03:10:07 [20151020-031006-ca44] [WARNING] Removing layer loss1/top-5 because top_k=5 while there are are only 2 labels in this dataset
2015-10-20 03:10:07 [20151020-031006-ca44] [WARNING] Removing layer loss2/top-5 because top_k=5 while there are are only 2 labels in this dataset
2015-10-20 03:10:07 [20151020-031006-ca44] [WARNING] Removing layer loss3/top-5 because top_k=5 while there are are only 2 labels in this dataset
2015-10-20 03:10:07 [20151020-031006-ca44] [INFO ] Train Caffe Model task started.
2015-10-20 03:10:12 [20151020-031006-ca44] [ERROR] Train Caffe Model: Check failed: ShapeEquals(proto) shape mismatch (reshape not set)
2015-10-20 03:10:12 [20151020-031006-ca44] [ERROR] Train Caffe Model task failed with error code -6

lukeyeager · 2015-10-19T17:20:23Z

Things I had to do to get the Princeton GoogLeNet model to run (you mentioned it at #373 (comment)):

Download the files
Upgrade the old prototxt file using Caffe's upgrade tool ($CAFFE_HOME/build/tools/upgrade_net_proto_text)
Remove the data_param.source and data_param.backend fields

Then I ran into the error you're reporting. For laughs, I tried it with the 0.14.0-alpha branch of NVcaffe and I got a much more helpful error message:

Looks like BVLC improved their error reporting in BVLC/caffe#2927 - nice work!

Remaining steps:

Rename cls1_fc2 layer to cls1_fc2_new
Rename cls2_fc2 layer to cls2_fc2_new
Rename cls3_fc layer to cls3_fc_new

Now it's working!

This is why I caution people about fine-tuning with Caffe - it's non-trivial!

Hopefully the above steps will be a help to you and others who want to try their hand at fine-tuning.

RadicoLabs · 2015-10-19T17:31:58Z

Where would one find the caffe upgrade tools, And also, is the Princeton Patch still valid?

The instructions found here suggest you need to install the patch, but when i tried this the other day the scripts would not build... Could i have done something wrong or is the patch outdated?

lukeyeager · 2015-10-19T17:47:53Z

Where would one find the caffe upgrade tools

Oh sorry, they're at $CAFFE_HOME/build/tools/upgrade_net_proto_text

is the Princeton Patch still valid?

I'm not sure what the patch does, but it sounds like it just helps with memory management to get you to the point of being able to train with a larger batch size. I wouldn't think that would be absolutely necessary.

If you do decide you need the patch, that's your own adventure. The commit their patch is based off of is almost a year old - BVLC/caffe@e8dee35. The oldest release of NVcaffe that DIGITS still supports is based off of a BVLC/caffe commit from March.

RadicoLabs · 2015-10-19T17:55:03Z

Yeah according to Model Zoo Under the section "GoogLeNet GPU implementation from Princeton." they say this:

We implemented GoogLeNet using a single GPU. Our main contribution is an effective way to initialize the network and a trick to overcome the GPU memory constraint by accumulating gradients over two training iterations.

I guess the only reason i want it, is for faster training... Only have 4Gb of VRAM atm, so trying to make the most out of the system i've got ;)

lukeyeager · 2015-10-19T18:03:44Z

Feel free to use their version of Caffe if that helps you, but I don't think you'll be able to use DIGITS to wrap it. You can try to merge this commit to get their version to work with DIGITS, but no promises on that.

I guess the only reason i want it, is for faster training...

I don't know how much their patch helps, but there are some really significant things that have been merged into Caffe since then that help with performance - namely multi-GPU and cuDNN v3.

I'm marking this issue as closed as this discussion is devolving from the original question.

lukeyeager mentioned this issue Oct 19, 2015

'module' object has no attribute 'set_mode_gpu' #373

Closed

lukeyeager added the question label Oct 19, 2015

lukeyeager closed this as completed Oct 19, 2015

RadicoLabs referenced this issue in jmancewicz/DIGITS Oct 19, 2015

Fix NVIDIA#365

2ecb826

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR: Check failed: ShapeEquals(proto) shape mismatch (reshape not set) #375

ERROR: Check failed: ShapeEquals(proto) shape mismatch (reshape not set) #375

RadicoLabs commented Oct 17, 2015

RadicoLabs commented Oct 17, 2015

lukeyeager commented Oct 19, 2015

RadicoLabs commented Oct 19, 2015

lukeyeager commented Oct 19, 2015

RadicoLabs commented Oct 19, 2015

lukeyeager commented Oct 19, 2015

RadicoLabs commented Oct 19, 2015

lukeyeager commented Oct 19, 2015

ERROR: Check failed: ShapeEquals(proto) shape mismatch (reshape not set) #375

ERROR: Check failed: ShapeEquals(proto) shape mismatch (reshape not set) #375

Comments

RadicoLabs commented Oct 17, 2015

RadicoLabs commented Oct 17, 2015

lukeyeager commented Oct 19, 2015

RadicoLabs commented Oct 19, 2015

lukeyeager commented Oct 19, 2015

RadicoLabs commented Oct 19, 2015

lukeyeager commented Oct 19, 2015

RadicoLabs commented Oct 19, 2015

lukeyeager commented Oct 19, 2015