Segmentation fault when copying PyTorch tensor to cuda just by importing Ray #2413

floringogianu · 2018-07-17T10:25:23Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS x86_64
Ray installed from (source or binary): binary (pip install ray)
Ray version: 0.5.0
Python version: Python 3.6.0 :: Continuum Analytics, Inc.
PyTorch version: 0.4.0
CUDA version: 9.1
Exact command to reproduce: python main.py, where main.py is:

import ray
import torch


def main():
    x = torch.rand(10).cuda()


if __name__ == '__main__':
    main()

Describe the problem

The code above results in a:

[1]    24554 segmentation fault (core dumped)  python main.py

I had a good initial experience toying with Ray and PyTorch, was running some benchmarks when I decided to check the CUDA support. Is Ray compatible with PyTorch CudaTensors?

Source code / logs

/tmp/ray and /tmp/raylogs are empty.

The text was updated successfully, but these errors were encountered:

floringogianu · 2018-07-17T10:57:09Z

By changing the order of torch and ray it stops seg-faulting. However I ran into another issue. Basically torch.cuda() operations seem unsupported at this point. Should I open a separate issue on this?

robertnishihara · 2018-07-17T23:49:24Z

Interesting, thanks for reporting this! It looks very related to #2159.

Out of curiosity, do you have tensorflow installed or not?

If you do import tensorflow before import ray, does it still segfault?

robertnishihara · 2018-07-17T23:52:57Z

@floringogianu can you clarify what you mean by torch.cuda() operations are unsupported?

Could you perhaps try registering a custom serializer/deserializer as in #1856 (comment) (though this may need to be updated for more recent versions of pytorch)? Actually, I think we already do something like this in https://github.com/apache/arrow/blob/4ba8769b4858dcd46a7ea7e40bd6c10102327a0d/python/pyarrow/serialization.py#L131-L153, but maybe we aren't registering serializers for the cuda equivalents.

floringogianu · 2018-07-18T19:34:40Z

@robertnishihara thanks for the quick reply, much appreciated. I took me a while to replay back because ray really got me hooked today, dabbled with it the entire day, I like it a lot! :)

Back to the issue: I created a separate conda virtualenv and installed tensorflow, the problem didn't reproduce. I then installed pytorch in this new env and again the segfault didn't reproduce. I returned to the virtualenv I used yesterday and again, no segfaults. I have no explanation for what is happening. Yesterday I crashed ray a lot and maybe some processes got stuck in memory causing the segfaults and the weird crashes and behavior I was experiencing when trying to use torch.cuda objects and operations.

In the examples I was toying with yesterday I didn't need registering new serializers because I was taking care to simply pass or return numpy objects.

I will close this issue for now and reopen it only if I get the segfaults again.

robertnishihara · 2018-07-18T19:49:13Z

Ok sounds good. Definitely reopen this if the issue occurs again.

floringogianu closed this as completed Jul 18, 2018

richardliaw mentioned this issue Jul 22, 2018

regression in 0.5 with pytorch segfault #2447

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault when copying PyTorch tensor to cuda just by importing Ray #2413

Segmentation fault when copying PyTorch tensor to cuda just by importing Ray #2413

floringogianu commented Jul 17, 2018

floringogianu commented Jul 17, 2018

robertnishihara commented Jul 17, 2018

robertnishihara commented Jul 17, 2018

floringogianu commented Jul 18, 2018

robertnishihara commented Jul 18, 2018

Segmentation fault when copying PyTorch tensor to cuda just by importing Ray #2413

Segmentation fault when copying PyTorch tensor to cuda just by importing Ray #2413

Comments

floringogianu commented Jul 17, 2018

System information

Describe the problem

Source code / logs

floringogianu commented Jul 17, 2018

robertnishihara commented Jul 17, 2018

robertnishihara commented Jul 17, 2018

floringogianu commented Jul 18, 2018

robertnishihara commented Jul 18, 2018