-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in newly added multi-gpu code #2922
Comments
Right now, multi-gpu is only used for training, so maybe #2903 didn't expect one to do multi-gpu training without learnable parameters. Anyway, this problem should be address soon. |
More fundamentally, what does it mean to train without learnable parameters? |
Indeed, it doesn't make sense to train without learnable parameters. |
It can make a lot of sense. Anyway, a segfault should never happen just because the network architecture doesn't make sense. |
A simple workaround provided in #2924. Debates can follow in new issues on whether allowing training without learnable parameter. |
Trying out simple networks with multi-gpu segfaults because they don't have learnable parameters:
In parallel.c these lines are critical:
If the net does not have learnable parameters,
size_
will be 0 andcudaMalloc
will return null pointers.I currently work around this by adding +1 byte to the allocated size, but there should be a better fix.
The text was updated successfully, but these errors were encountered: