Fix solver GPU initialization order (e.g., training with cuDNN on non-default device) #1083

longjon · 2014-09-15T21:35:25Z

To address #925, allowing the --gpu flag to override the device_id setting in the solver prototxt, patch #961 reads the --gpu flag after constructing the solver.

Unfortunately, this means that nets used by the solver are constructed using the default GPU, which is unexpected and breaks cuDNN, which constructs handles in LayerSetUp calls. To fix this, we take care of setting mode and device (with --gpu flag overriding device_id param) before the solver is constructed.

In accordance with the feeling at #925, setting mode and device is completely removed from the solver code. N.B., this does change the behavior when invoking the solver without using caffe train. But I don't really expect this to affect anybody.

Previously, the solver constructed nets before the caffe train tool read the --gpu flag, which can cause errors due to LayerSetUp executing on the wrong device (breaking cuDNN, for example).

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device)

shelhamer · 2014-09-15T22:59:33Z

Thanks for the catch Jon!

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device)

fix caffe train GPU initialization

bbd166e

Previously, the solver constructed nets before the caffe train tool read the --gpu flag, which can cause errors due to LayerSetUp executing on the wrong device (breaking cuDNN, for example).

shelhamer added a commit that referenced this pull request Sep 15, 2014

Merge pull request #1083 from longjon/fix-solver-gpu-init

1f4e039

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device)

shelhamer merged commit 1f4e039 into BVLC:dev Sep 15, 2014

shelhamer added the bug label Sep 15, 2014

longjon mentioned this pull request Sep 25, 2014

Add set_mode() to solver #1154

Closed

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#1083 from longjon/fix-solver-gpu-init

fbc7cb4

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device)

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#1083 from longjon/fix-solver-gpu-init

9186ed5

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device)

regmiz mentioned this pull request Nov 29, 2014

Enable CUDNN for Caffe zigvu/chia#25

Closed

longjon deleted the fix-solver-gpu-init branch December 30, 2014 04:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device) #1083

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device) #1083

longjon commented Sep 15, 2014

shelhamer commented Sep 15, 2014

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device) #1083

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device) #1083

Conversation

longjon commented Sep 15, 2014

shelhamer commented Sep 15, 2014