device='cuda:1' did not work #38

tkotani · 2024-08-07T03:38:22Z

In your Simple example at https://github.com/princeton-vl/lietorch/blob/master/README.md#simple-example
,
I replaced

phi = torch.randn(8000, 3, device='cuda', requires_grad=True)

with

phi = torch.randn(8000, 3, device='cuda:1', requires_grad=True)

. Then I named the example as lietest1.py .Then I run

>python lietest1.py

ends up with

Traceback (most recent call last):
  File "/home/takao/org/diffusion-point-cloud/lietest1.py", line 15, in <module>
    loss.backward()
  File "/home/takao/.local/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
    torch.autograd.backward(
  File "/home/takao/.local/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
    _engine_run_backward(
  File "/home/takao/.local/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/takao/.local/lib/python3.10/site-packages/torch/autograd/function.py", line 301, in apply
    return user_fn(self, *args)
  File "/home/takao/.local/lib/python3.10/site-packages/lietorch/group_ops.py", line 24, in backward
    grad_inputs = cls.backward_op(ctx.group_id, grad, *inputs)
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Could you help me to resolve this problem?

Best,
Takao Kotani

The text was updated successfully, but these errors were encountered:

edexheim · 2024-08-31T13:35:40Z

Hi,

I ran into a similar issue awhile back where I wanted to use lietorch on multiple GPUs. For my case, including CUDA device guards worked for me (edexheim@02f881d).

Best,
Eric

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

device='cuda:1' did not work #38

device='cuda:1' did not work #38

tkotani commented Aug 7, 2024 •

edited

Loading

edexheim commented Aug 31, 2024

device='cuda:1' did not work #38

device='cuda:1' did not work #38

Comments

tkotani commented Aug 7, 2024 • edited Loading

edexheim commented Aug 31, 2024

tkotani commented Aug 7, 2024 •

edited

Loading