Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock when train on multiple GPU and restore the state from previous training #255

Closed
CFAndy opened this issue Oct 18, 2016 · 3 comments

Comments

@CFAndy
Copy link

CFAndy commented Oct 18, 2016

Nvidia-smi -l show
Root solver enter into 0% utilization
while the slaver solver are keep 100% utilization.
BVLC caffe has no such issue.
fixed by #254

@CFAndy
Copy link
Author

CFAndy commented Jun 6, 2017

Bug is still there. I use the 0.16 to train mobilenet with 4 titanx-pascal, caffe entered into deadlock state randomly after a test run. Have to disable the test.

@drnikolaev
Copy link

Thanks, I'm looking to it...

@drnikolaev
Copy link

Fixed in 0.16.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants