Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in newly added multi-gpu code #2922

Closed
philipp-fischer opened this issue Aug 14, 2015 · 5 comments · Fixed by #2924
Closed

Bug in newly added multi-gpu code #2922

philipp-fischer opened this issue Aug 14, 2015 · 5 comments · Fixed by #2924
Labels

Comments

@philipp-fischer
Copy link

Trying out simple networks with multi-gpu segfaults because they don't have learnable parameters:

In parallel.c these lines are critical:

CUDA_CHECK(cudaMalloc(&data_, size_ * sizeof(Dtype)));
[...]
CUDA_CHECK(cudaMalloc(&diff_, size_ * sizeof(Dtype)));
[...]
CUDA_CHECK(cudaMalloc(&parent_grads_, size_ * sizeof(Dtype)));

If the net does not have learnable parameters, size_ will be 0 and cudaMalloc will return null pointers.

I currently work around this by adding +1 byte to the allocated size, but there should be a better fix.

@ronghanghu
Copy link
Member

Right now, multi-gpu is only used for training, so maybe #2903 didn't expect one to do multi-gpu training without learnable parameters.

Anyway, this problem should be address soon.

@thatguymike
Copy link
Contributor

More fundamentally, what does it mean to train without learnable parameters?

@ronghanghu
Copy link
Member

More fundamentally, what does it mean to train without learnable parameters?

Indeed, it doesn't make sense to train without learnable parameters.

@philipp-fischer
Copy link
Author

It can make a lot of sense.
Say I want to debug my data layer for multi-gpu usage, and just write the outputs to a file with a python layer..

Anyway, a segfault should never happen just because the network architecture doesn't make sense.

@ronghanghu
Copy link
Member

A simple workaround provided in #2924. Debates can follow in new issues on whether allowing training without learnable parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants