Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Inference.infer results when running with multiple GPUs. #3073

Closed
xinghai-sun opened this issue Jul 26, 2017 · 1 comment
Closed
Assignees
Labels

Comments

@xinghai-sun
Copy link
Contributor

Inference.infer will have incorrect results when running with multiple GPUs.

Below is the output data of the first convolution layer for 4 example instances, when trainer_count=1 (upper figure) and trainer_count=2 (lower figure).

screen shot 2017-07-26 at 6 27 09 pm

screen shot 2017-07-26 at 6 27 48 pm

The output results are wrong if trainer_count > 1 (use_gpu=True).

If we just print the input data layer, no difference is found between the two cases, indicating that the problem might exist in models instead of data allocation across GPUs.

@qingqing01
Copy link
Contributor

qingqing01 commented Jul 26, 2017

I print the weight of first convolution layer in each thread when trainer_count=2, the weights of first thread are same with the saved model. But the weights of second thread are zeros. We can make certain that there is a bug in the weight dispatch for multi-GPU in infer mode. And I'll keep on tracking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants