[DTensor] Change Sharding algorithm to be in line with `torch.chunk()` #98722

wz337 · 2023-04-10T03:41:05Z

As functional collective being updated, using tensor_split() as the underlying sharding algorithm would require padding and unpadding on multiple ranks. Therefore, we are changing the sharding algorithm to be in line with torch.chunk() to allow padding on the last two ranks in most of the scenarios.

pytorch-bot · 2023-04-10T03:41:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98722

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm jobs are failing with No space left on device

✅ No Failures

As of commit 1fdbbf2:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wanchaol

First pass, looks pretty good already, I have a few suggestions inlined. Mainly I think we should use new_zeros to create empty shard, which could help us consolidate the padding logic and no need to infer the device :)

wanchaol · 2023-04-12T05:27:54Z

torch/distributed/_tensor/placement_types.py

+            )
+            return torch.tensor([], device=device)
+        else:
+            return tensor

    def _unpad_concat_tensor(


I think we can probably delete this _unpad_concat_tensor as it's only being used by a test, we can fix that test instead :)

wanchaol · 2023-04-12T05:38:44Z

torch/distributed/_tensor/placement_types.py

+        device = torch.device(
+            tensor.get_device() if torch.cuda.is_available() else "cpu"
+        )
+        empty_tensor = torch.tensor([], device=device)


hmmm actually I think we should not "infer" device like this. given that we have tensor already, we should always use the tensor itself to create a new tensor. I think we can do the following to correctly get empty tensor:

tensor.new_zeros((0, 3)) -> tensor([], size=(0, 3)) # this would give a empty tensor, but with the correct shape! tensor.new_zeros(shape) -> tensor([..]) # this would give a tensor with "0" filled in according to the new `shape`, and have the same shape/dtype with the original tensor

wanchaol · 2023-04-12T05:42:46Z

torch/distributed/_tensor/placement_types.py

+            )
+            tensor_size = list(reference_tensor.size())
+            tensor_size = [dim if dim >= self.dim else 0 for dim in tensor_size]  # type: ignore[attr-defined]
+            return torch.zeros(tensor_size, device=device)


similarly here by using new_zeros, we can create a new zeros tensor directly with the expected shape! i.e.

b= torch.tensor([], size=(0, 3)) b.new_zeros((3, 3) -> works!

wanchaol · 2023-04-12T05:43:49Z

torch/distributed/_tensor/placement_types.py

+            device = torch.device(
+                tensor.get_device() if torch.cuda.is_available() else "cpu"
+            )
+            return torch.tensor([], device=device)


same here, sth like: tensor.new_zeros((0, other_dims))

wanchaol · 2023-04-12T06:09:09Z

torch/distributed/_tensor/placement_types.py

+        self,
+        tensor: torch.Tensor,
+        pad_size: int,
+        reference_tensor: Optional[torch.Tensor] = None,


I think we probably don't need this reference_tensor anymore if we creating the empty tensor with sth like tensor.new_zeros((0, 3)), we only need the pad_size and that would make the padding logic consistent too.

wanchaol · 2023-04-12T06:11:32Z

torch/distributed/_tensor/placement_types.py

+        else:
+            pad = [0, 0] * (tensor.ndim - self.dim)
+            pad[-1] = pad_size
+            return torch.nn.functional.pad(tensor, pad)


I think padding also works for tensor like torch.tensor([], size=(0, 3))

wanchaol · 2023-04-12T06:16:39Z

torch/distributed/_tensor/placement_types.py

        local_offset_on_dim = -1
        if return_offset:
+            # QQ: what would be the offset of an empty shard? -1?


hmmm that's a good point... do you know how sharded tensor offset on empty shard looks like? I think we might can return the "global tensor dim size" for empty shard (representing the end of that tensor dim), but would like to see if this make sense for existing use case.

Let me double check on this.

wanchaol · 2023-04-12T06:18:46Z

torch/distributed/_tensor/placement_types.py

@@ -39,7 +39,7 @@ def _split_tensor(
        *,
        with_padding: bool = True,
        contiguous: bool = True,
-    ) -> Tuple[List[torch.Tensor], int]:
+    ) -> Tuple[List[torch.Tensor], int, List[int]]:


so it looks like all the callsites (except test_device_mesh.py don't need the second return argument anymore (i think it's because this embeds in the last return argument). Shall we delete the second return arg and return two args Tuple[List[torch.Tensor], List[int]] instead?

Good point! Actually thinking about the same thing!

wanchaol · 2023-04-12T06:23:27Z

torch/distributed/_tensor/placement_types.py

+            for idx in range(num_chunks)
+        ]
+        # Get idx start to pad
+        idx_start_to_pad = next(


do we really need this given that we only need pad_sizes? pad_sizes can actually be computed without this iiuc.

wanchaol · 2023-04-12T06:24:19Z

torch/distributed/_tensor/placement_types.py

+        # Compute pad size on each chunk
+        pad_sizes = [
+            full_chunk_size - chunk_size if idx >= idx_start_to_pad else 0
+            for idx, chunk_size in enumerate(chunk_sizes)


i.e. we don't really need to check if idx >=idx_start_to_pad? it could always be full_chunk_size-chunk_size, for ranks don't need to pad, the subtraction would become 0 automatically?

wz337 · 2023-04-18T09:33:40Z

@pytorchmergebot rebase

pytorchmergebot · 2023-04-18T09:35:43Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-04-18T09:35:50Z

Successfully rebased dtensor_update onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout dtensor_update && git pull --rebase)

wanchaol

Looks great! Thanks for working on it! I have a few more suggestions inlined.

wanchaol · 2023-04-18T21:06:20Z

torch/distributed/_tensor/placement_types.py

+        # Explicitly return an empty tensor. Otherwise, even if the
+        # tensor is empty, the size won't be 0.
+        if tensor.numel() == 0:
+            return tensor.new_zeros([0])


Hmmm could you explain why we need to explicitly return an empty tensor? iiuc after tensor.narrow there would be some ranks have tensor with shape ([0, 8]) for example, that's still considered as empty tensor with no data, so that should work for us.

Hmmm could you explain why we need to explicitly return an empty tensor? iiuc after tensor.narrow there would be some ranks have tensor with shape ([0, 8]) for example, that's still considered as empty tensor with no data, so that should work for us.

The reason is that we have test to compare an unpad tensor with original tensor_to_split.

https://github.com/pytorch/pytorch/blob/ada67c9d8a3ac61fa0af5b8a186d5ecb31765af5/test/distributed/_tensor/test_device_mesh.py#L210-L212

In this case, the size of the two would fail the assert as one could be ([0, 8]) and the other one is ([0]), although both are just empty tensors([]). I guess I could update the test cases to make _unpad() more consistent.

wanchaol · 2023-04-19T22:00:17Z

torch/distributed/_tensor/placement_types.py

+        ]
+        # Compute pad size on each chunk
+        pad_sizes = [
+            full_chunk_size - chunk_size for idx, chunk_size in enumerate(chunk_sizes)


nit: no need to have idx here?

wanchaol · 2023-04-19T22:32:01Z

torch/distributed/_tensor/placement_types.py

+            for idx in range(num_chunks)
+        ]
+        pad_sizes = [full_chunk_size - chunk_size for chunk_size in chunk_sizes]
+        is_padded = not all(pad_size == 0 for pad_size in pad_sizes)


nit: why don't we use a similar condition for is_padded in reduce_scatter_tensor? i.e. size[self.dim]% num_chunks != 0

wz337 · 2023-04-20T18:35:51Z

@pytorchmergebot rebase

pytorchmergebot · 2023-04-20T18:37:53Z

@pytorchbot successfully started a rebase job. Check the current status here

pytorchmergebot · 2023-04-20T18:37:59Z

Successfully rebased dtensor_update onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout dtensor_update && git pull --rebase)

wz337 · 2023-04-20T23:51:42Z

@pytorchmergebot merge

pytorchmergebot · 2023-04-20T23:53:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

When tensor.size(self.dim) < num_chunks, we will fill empty chunk with empty tensor (#98722). Therefore, we no longer needs this assert. For example, when sharding a tensor with 1 element on 2 ranks along dim 0, results would be as follows: ``` rank:0, dtensor:DTensor(local_tensor=tensor([0.4963], device='cuda:0'), device_mesh=DeviceMesh:([0, 1]), placements=[Shard(dim=0)]) rank:1, dtensor:DTensor(local_tensor=tensor([], device='cuda:1'), device_mesh=DeviceMesh:([0, 1]), placements=[Shard(dim=0)]) ``` Pull Request resolved: #101218 Approved by: https://github.com/wanchaol

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Apr 10, 2023

wz337 force-pushed the dtensor_update branch from 043761d to 7870af8 Compare April 10, 2023 07:57

wz337 marked this pull request as ready for review April 10, 2023 21:47

wz337 requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501, wanchaol, fegin, kiukchung and d4l3k as code owners April 10, 2023 21:47

wanchaol reviewed Apr 12, 2023

View reviewed changes

wanchaol added release notes: distributed (dtensor) release notes category and removed release notes: distributed (fsdp) release notes category labels Apr 12, 2023

pytorchmergebot force-pushed the dtensor_update branch from 781faa5 to ada67c9 Compare April 18, 2023 09:35

wz337 requested a review from wanchaol April 18, 2023 15:35

wanchaol approved these changes Apr 19, 2023

View reviewed changes

wz337 force-pushed the dtensor_update branch from ada67c9 to 522ca0c Compare April 20, 2023 18:35

wz337 added 4 commits April 20, 2023 18:37

dtensor update

d60308c

dtensor update

0757c55

update dtensor sharding and padding algorithm

7402ad6

fix conflicts

ed096b0

wz337 added 12 commits April 20, 2023 18:37

fix lint

6c409ed

update _pad_tensor()

0463b0e

fix lint

e730527

fix typo

40a8781

remove idx_start_to_pad

47531fa

remove _unpad_concat_tensor

8375b41

reuse tensor

6dc62df

fix lint

1edfcd2

fix all tests failures

0d50248

add test_compute_local_offset_1d test

0323ee2

update documentation

d26320e

address _unpad issue

1fdbbf2

pytorchmergebot force-pushed the dtensor_update branch from 522ca0c to 1fdbbf2 Compare April 20, 2023 18:38

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 20, 2023

pytorchmergebot added the merging label Apr 20, 2023

pytorchmergebot added the Merged label Apr 21, 2023

pytorchmergebot closed this in 0d2b55c Apr 21, 2023

wz337 mentioned this pull request May 11, 2023

[dtensor] Relax condition for _split_tensor() #101218

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DTensor] Change Sharding algorithm to be in line with `torch.chunk()` #98722

[DTensor] Change Sharding algorithm to be in line with `torch.chunk()` #98722

wz337 commented Apr 10, 2023

pytorch-bot bot commented Apr 10, 2023 •

edited

Loading

wanchaol left a comment

wanchaol Apr 12, 2023

wanchaol Apr 12, 2023

wanchaol Apr 12, 2023

wanchaol Apr 12, 2023

wanchaol Apr 12, 2023

wanchaol Apr 12, 2023

wanchaol Apr 12, 2023

wz337 Apr 12, 2023

wanchaol Apr 12, 2023

wz337 Apr 12, 2023

wanchaol Apr 12, 2023

wanchaol Apr 12, 2023

wz337 commented Apr 18, 2023

pytorchmergebot commented Apr 18, 2023

pytorchmergebot commented Apr 18, 2023

wanchaol left a comment

wanchaol Apr 18, 2023

wz337 Apr 20, 2023 •

edited

Loading

wanchaol Apr 19, 2023

wanchaol Apr 19, 2023

wz337 commented Apr 20, 2023

pytorchmergebot commented Apr 20, 2023

pytorchmergebot commented Apr 20, 2023

wz337 commented Apr 20, 2023

pytorchmergebot commented Apr 20, 2023

[DTensor] Change Sharding algorithm to be in line with torch.chunk() #98722

[DTensor] Change Sharding algorithm to be in line with torch.chunk() #98722

Conversation

wz337 commented Apr 10, 2023

pytorch-bot bot commented Apr 10, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/98722

❗ 1 Active SEVs

✅ No Failures

wanchaol left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wz337 commented Apr 18, 2023

pytorchmergebot commented Apr 18, 2023

pytorchmergebot commented Apr 18, 2023

wanchaol left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wz337 Apr 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wz337 commented Apr 20, 2023

pytorchmergebot commented Apr 20, 2023

pytorchmergebot commented Apr 20, 2023

wz337 commented Apr 20, 2023

pytorchmergebot commented Apr 20, 2023

Merge started

[DTensor] Change Sharding algorithm to be in line with `torch.chunk()` #98722

[DTensor] Change Sharding algorithm to be in line with `torch.chunk()` #98722

pytorch-bot bot commented Apr 10, 2023 •

edited

Loading

wz337 Apr 20, 2023 •

edited

Loading