deadlock when train on multiple GPU and restore the state from previous training #255

CFAndy · 2016-10-18T00:15:03Z

Nvidia-smi -l show
Root solver enter into 0% utilization
while the slaver solver are keep 100% utilization.
BVLC caffe has no such issue.
fixed by #254

CFAndy · 2017-06-06T22:41:43Z

Bug is still there. I use the 0.16 to train mobilenet with 4 titanx-pascal, caffe entered into deadlock state randomly after a test run. Have to disable the test.

drnikolaev · 2017-06-06T23:02:50Z

Thanks, I'm looking to it...

drnikolaev · 2017-06-25T06:41:30Z

Fixed in 0.16.2

drnikolaev closed this as completed May 16, 2017

drnikolaev reopened this Jun 6, 2017

drnikolaev mentioned this issue Jun 22, 2017

June 2017 release #359

Merged

drnikolaev closed this as completed Jun 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deadlock when train on multiple GPU and restore the state from previous training #255

deadlock when train on multiple GPU and restore the state from previous training #255

CFAndy commented Oct 18, 2016

CFAndy commented Jun 6, 2017

drnikolaev commented Jun 6, 2017

drnikolaev commented Jun 25, 2017

deadlock when train on multiple GPU and restore the state from previous training #255

deadlock when train on multiple GPU and restore the state from previous training #255

Comments

CFAndy commented Oct 18, 2016

CFAndy commented Jun 6, 2017

drnikolaev commented Jun 6, 2017

drnikolaev commented Jun 25, 2017