[core][distributed] fix custom allreduce in pytorch 2.5 #9815

youkaichao · 2024-10-29T23:02:48Z

pytorch changes the binary format of the ipc handle.

Signed-off-by: youkaichao <[email protected]>

github-actions · 2024-10-29T23:02:59Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-10-29T23:06:23Z

cc @hanzhi713

can you please improve the code if you have time? ideally we should use pytorch's user-facing api, rather than this private api.

example of user-facing api:

# producer:

import torch
from torch.multiprocessing.reductions import reduce_tensor

inp = torch.randn(5, 5).cuda()

out = reduce_tensor(inp)

# send `out` to consumer

# consumer

func = out[0]
tensor = func(*out[1])

this way, we won't suffer from too many pytorch internal details.

and once we share the tensor from python side, c++ side code will be much simpler, and we can also benefit from expandable segment in the future.

cedonley · 2024-10-30T00:03:19Z

FYI - can confirm this fixes the issue with TP=2 on NVLink A6000's that was introduced with the upgrade to pytorch 2.5. Nice catch on the handle format change. I thought I was going crazy, but hadn't noticed the 2-byte change in length of the handle itself when I compared the data doing into the ipc functions in the two versions.

youkaichao · 2024-10-30T00:05:34Z

@cedonley thanks for your report and investigation!

hanzhi713 · 2024-10-30T04:47:23Z

@youkaichao There're no user facing API for getting a shareable handle. To avoid using internal Pytorch API, I think we can just call cudaIpcGetMemHandle directly on pytorch allocated tensors like here

vllm/csrc/custom_all_reduce.cuh

Line 359 in 04a3ae0

CUDACHECK(cudaIpcGetMemHandle(

The downside would be that we will lose pytorch's safeguards against leaks, but I think that might not be a problem since allocations in custom allreduce are one-time allocations.

hanzhi713 · 2024-10-30T04:48:46Z

I will have some time this weekend to work on this.

youkaichao · 2024-10-30T05:05:50Z

do you really need a handle?

we can get an ipc-shared tensor directly:

# producer:

import torch
from torch.multiprocessing.reductions import reduce_tensor

inp = torch.randn(5, 5).cuda()

out = reduce_tensor(inp)

# send `out` to consumer

# consumer

func = out[0]
tensor = func(*out[1])

youkaichao · 2024-10-30T05:07:43Z

@hanzhi713 welcome to join our new slack https://slack.vllm.ai for chatting and collaborating!

hanzhi713 · 2024-10-30T05:52:29Z

How do you share such a handle to other processes? I think vllm still runs one process per GPU right?

Is it just sending pickled data through torch.distributed? IMHO torch.multiprocessing is designed for sharing handles through multiprocessing.Process. It may work with generic processes, but I'm not sure if there are any caveats.

youkaichao · 2024-10-30T06:05:28Z

Is it just sending pickled data through torch.distributed?

yes.

IMHO torch.multiprocessing is designed for sharing handles through multiprocessing.Process

it applies to general processes, no matter how the process is launched.

…#9815) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Randall Smith <[email protected]>

…#9815) Signed-off-by: youkaichao <[email protected]> Signed-off-by: NickLucche <[email protected]>

…#9815) Signed-off-by: youkaichao <[email protected]>

…#9815) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Linkun Chen <[email protected]>

…#9815) Signed-off-by: youkaichao <[email protected]>

…#9815) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Loc Huynh <[email protected]>

youkaichao added 4 commits October 29, 2024 15:51

fix custom allreduce in pytorch 2.5

588114e

Signed-off-by: youkaichao <[email protected]>

2.5

0187b02

Signed-off-by: youkaichao <[email protected]>

fix

4c9759d

Signed-off-by: youkaichao <[email protected]>

fix handle

f9459f0

Signed-off-by: youkaichao <[email protected]>

youkaichao requested a review from WoosukKwon October 29, 2024 23:06

youkaichao merged commit 1ab6f6b into vllm-project:main Oct 30, 2024
29 of 31 checks passed

youkaichao deleted the fix_ca branch October 30, 2024 00:06

rasmith pushed a commit to rasmith/vllm that referenced this pull request Oct 30, 2024

[core][distributed] fix custom allreduce in pytorch 2.5 (vllm-project…

f90a1d0

…#9815) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Randall Smith <[email protected]>

NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Oct 31, 2024

[core][distributed] fix custom allreduce in pytorch 2.5 (vllm-project…

a253573

…#9815) Signed-off-by: youkaichao <[email protected]> Signed-off-by: NickLucche <[email protected]>

NickLucche pushed a commit to NickLucche/vllm that referenced this pull request Oct 31, 2024

[core][distributed] fix custom allreduce in pytorch 2.5 (vllm-project…

543400f

…#9815) Signed-off-by: youkaichao <[email protected]> Signed-off-by: NickLucche <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[core][distributed] fix custom allreduce in pytorch 2.5 (vllm-project…

de0b86d

…#9815) Signed-off-by: youkaichao <[email protected]>

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Nov 4, 2024

[core][distributed] fix custom allreduce in pytorch 2.5 (vllm-project…

dca7f5b

…#9815) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Linkun Chen <[email protected]>

tlrmchlsmth mentioned this pull request Nov 5, 2024

[Core][Distributed] Refactor ipc buffer init in CustomAllreduce #10030

Merged

youkaichao mentioned this pull request Nov 6, 2024

[distributed] add function to create ipc buffers directly #10064

Merged

hissu-hyvarinen pushed a commit to ROCm/vllm that referenced this pull request Nov 6, 2024

[core][distributed] fix custom allreduce in pytorch 2.5 (vllm-project…

25e128f

…#9815) Signed-off-by: youkaichao <[email protected]>

siddvenk pushed a commit to siddvenk/vllm that referenced this pull request Nov 8, 2024

[core][distributed] fix custom allreduce in pytorch 2.5 (vllm-project…

1b69ea4

…#9815) Signed-off-by: youkaichao <[email protected]>

JC1DA pushed a commit to JC1DA/vllm that referenced this pull request Nov 11, 2024

[core][distributed] fix custom allreduce in pytorch 2.5 (vllm-project…

b3b4ac7

…#9815) Signed-off-by: youkaichao <[email protected]> Signed-off-by: Loc Huynh <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core][distributed] fix custom allreduce in pytorch 2.5 #9815

[core][distributed] fix custom allreduce in pytorch 2.5 #9815

youkaichao commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Oct 29, 2024

youkaichao commented Oct 29, 2024 •

edited

Loading

cedonley commented Oct 30, 2024

youkaichao commented Oct 30, 2024

hanzhi713 commented Oct 30, 2024

hanzhi713 commented Oct 30, 2024

youkaichao commented Oct 30, 2024

youkaichao commented Oct 30, 2024

hanzhi713 commented Oct 30, 2024 •

edited

Loading

youkaichao commented Oct 30, 2024

[core][distributed] fix custom allreduce in pytorch 2.5 #9815

[core][distributed] fix custom allreduce in pytorch 2.5 #9815

Conversation

youkaichao commented Oct 29, 2024 • edited Loading

github-actions bot commented Oct 29, 2024

youkaichao commented Oct 29, 2024 • edited Loading

cedonley commented Oct 30, 2024

youkaichao commented Oct 30, 2024

hanzhi713 commented Oct 30, 2024

hanzhi713 commented Oct 30, 2024

youkaichao commented Oct 30, 2024

youkaichao commented Oct 30, 2024

hanzhi713 commented Oct 30, 2024 • edited Loading

youkaichao commented Oct 30, 2024

youkaichao commented Oct 29, 2024 •

edited

Loading

youkaichao commented Oct 29, 2024 •

edited

Loading

hanzhi713 commented Oct 30, 2024 •

edited

Loading