You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inference.infer will have incorrect results when running with multiple GPUs.
Below is the output data of the first convolution layer for 4 example instances, when trainer_count=1 (upper figure) and trainer_count=2 (lower figure).
The output results are wrong if trainer_count > 1 (use_gpu=True).
If we just print the input data layer, no difference is found between the two cases, indicating that the problem might exist in models instead of data allocation across GPUs.
The text was updated successfully, but these errors were encountered:
I print the weight of first convolution layer in each thread when trainer_count=2, the weights of first thread are same with the saved model. But the weights of second thread are zeros. We can make certain that there is a bug in the weight dispatch for multi-GPU in infer mode. And I'll keep on tracking.
Inference.infer will have incorrect results when running with multiple GPUs.
Below is the output data of the first convolution layer for 4 example instances, when trainer_count=1 (upper figure) and trainer_count=2 (lower figure).
The output results are wrong if trainer_count > 1 (use_gpu=True).
If we just print the input data layer, no difference is found between the two cases, indicating that the problem might exist in models instead of data allocation across GPUs.
The text was updated successfully, but these errors were encountered: