Incorrect Inference.infer results when running with multiple GPUs. #3073

xinghai-sun · 2017-07-26T10:36:35Z

Inference.infer will have incorrect results when running with multiple GPUs.

Below is the output data of the first convolution layer for 4 example instances, when trainer_count=1 (upper figure) and trainer_count=2 (lower figure).

The output results are wrong if trainer_count > 1 (use_gpu=True).

If we just print the input data layer, no difference is found between the two cases, indicating that the problem might exist in models instead of data allocation across GPUs.

qingqing01 · 2017-07-26T16:13:50Z

I print the weight of first convolution layer in each thread when trainer_count=2, the weights of first thread are same with the saved model. But the weights of second thread are zeros. We can make certain that there is a bug in the weight dispatch for multi-GPU in infer mode. And I'll keep on tracking.

xinghai-sun assigned xinghai-sun and qingqing01 Jul 26, 2017

xinghai-sun mentioned this issue Jul 26, 2017

Task List for DS2 on Paddle PaddlePaddle/models#176

Closed

17 tasks

qingqing01 mentioned this issue Jul 27, 2017

Fix bug for multi-GPU inference. #3082

Merged

qingqing01 added the Bug label Jul 27, 2017

qingqing01 closed this as completed in #3082 Jul 27, 2017

heavengate added a commit to heavengate/Paddle that referenced this issue Aug 16, 2021

update to 2.1 version (PaddlePaddle#3073)

2acad63

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect Inference.infer results when running with multiple GPUs. #3073

Incorrect Inference.infer results when running with multiple GPUs. #3073

xinghai-sun commented Jul 26, 2017

qingqing01 commented Jul 26, 2017 •

edited

Loading

Incorrect Inference.infer results when running with multiple GPUs. #3073

Incorrect Inference.infer results when running with multiple GPUs. #3073

Comments

xinghai-sun commented Jul 26, 2017

qingqing01 commented Jul 26, 2017 • edited Loading

qingqing01 commented Jul 26, 2017 •

edited

Loading