Bug in newly added multi-gpu code #2922

philipp-fischer · 2015-08-14T13:56:05Z

Trying out simple networks with multi-gpu segfaults because they don't have learnable parameters:

In parallel.c these lines are critical:

CUDA_CHECK(cudaMalloc(&data_, size_ * sizeof(Dtype)));
[...]
CUDA_CHECK(cudaMalloc(&diff_, size_ * sizeof(Dtype)));
[...]
CUDA_CHECK(cudaMalloc(&parent_grads_, size_ * sizeof(Dtype)));

If the net does not have learnable parameters, size_ will be 0 and cudaMalloc will return null pointers.

I currently work around this by adding +1 byte to the allocated size, but there should be a better fix.

The text was updated successfully, but these errors were encountered:

ronghanghu · 2015-08-14T15:58:26Z

Right now, multi-gpu is only used for training, so maybe #2903 didn't expect one to do multi-gpu training without learnable parameters.

Anyway, this problem should be address soon.

thatguymike · 2015-08-14T16:51:36Z

More fundamentally, what does it mean to train without learnable parameters?

ronghanghu · 2015-08-14T16:56:03Z

More fundamentally, what does it mean to train without learnable parameters?

Indeed, it doesn't make sense to train without learnable parameters.

philipp-fischer · 2015-08-14T17:00:06Z

It can make a lot of sense.
Say I want to debug my data layer for multi-gpu usage, and just write the outputs to a file with a python layer..

Anyway, a segfault should never happen just because the network architecture doesn't make sense.

ronghanghu · 2015-08-14T17:07:31Z

A simple workaround provided in #2924. Debates can follow in new issues on whether allowing training without learnable parameter.

ronghanghu closed this as completed Aug 14, 2015

ronghanghu reopened this Aug 14, 2015

ronghanghu added the bug label Aug 14, 2015

ronghanghu mentioned this issue Aug 14, 2015

Malloc at least one byte in Parallel #2924

Merged

ronghanghu closed this as completed in #2924 Aug 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in newly added multi-gpu code #2922

Bug in newly added multi-gpu code #2922

philipp-fischer commented Aug 14, 2015

ronghanghu commented Aug 14, 2015

thatguymike commented Aug 14, 2015

ronghanghu commented Aug 14, 2015

philipp-fischer commented Aug 14, 2015

ronghanghu commented Aug 14, 2015

Bug in newly added multi-gpu code #2922

Bug in newly added multi-gpu code #2922

Comments

philipp-fischer commented Aug 14, 2015

ronghanghu commented Aug 14, 2015

thatguymike commented Aug 14, 2015

ronghanghu commented Aug 14, 2015

philipp-fischer commented Aug 14, 2015

ronghanghu commented Aug 14, 2015