Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix solver GPU initialization order (e.g., training with cuDNN on non-default device) #1083

Merged
merged 1 commit into from
Sep 15, 2014

Conversation

longjon
Copy link
Contributor

@longjon longjon commented Sep 15, 2014

To address #925, allowing the --gpu flag to override the device_id setting in the solver prototxt, patch #961 reads the --gpu flag after constructing the solver.

Unfortunately, this means that nets used by the solver are constructed using the default GPU, which is unexpected and breaks cuDNN, which constructs handles in LayerSetUp calls. To fix this, we take care of setting mode and device (with --gpu flag overriding device_id param) before the solver is constructed.

In accordance with the feeling at #925, setting mode and device is completely removed from the solver code. N.B., this does change the behavior when invoking the solver without using caffe train. But I don't really expect this to affect anybody.

Previously, the solver constructed nets before the caffe train tool read
the --gpu flag, which can cause errors due to LayerSetUp executing on
the wrong device (breaking cuDNN, for example).
shelhamer added a commit that referenced this pull request Sep 15, 2014
Fix solver GPU initialization order (e.g., training with cuDNN on non-default device)
@shelhamer shelhamer merged commit 1f4e039 into BVLC:dev Sep 15, 2014
@shelhamer
Copy link
Member

Thanks for the catch Jon!

@shelhamer shelhamer added the bug label Sep 15, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
Fix solver GPU initialization order (e.g., training with cuDNN on non-default device)
RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
Fix solver GPU initialization order (e.g., training with cuDNN on non-default device)
@longjon longjon deleted the fix-solver-gpu-init branch December 30, 2014 04:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants