Helper function `all_gather_tensors_with_shapes()` #3281

sadra-barikbin · 2024-09-04T14:51:02Z

No description provided.

ignite/distributed/utils.py

vfdev-5

LTGM

ignite/distributed/utils.py

Co-authored-by: vfdev <[email protected]>

…-tensors-with-different-shapes' into feature-allgather-tensors-with-different-shapes

ignite/distributed/utils.py

vfdev-5 · 2024-09-05T08:33:09Z

ignite/distributed/utils.py

+    if isinstance(_model, _SerialModel) or group == dist.GroupMember.NON_GROUP_MEMBER:
+        return [tensor]
+
+    max_shape = torch.tensor(shapes).amax(dim=0)


I wonder whether we could actually get tensor shapes using all_gather such that shapes arg can be optional ?

Yes we can. Do you want it in this PR?

Up to you, if you would like it in another PR, OK to me as well

Let's make it in another PR, I'll merge this one as CI is green

vfdev-5 · 2024-09-09T08:57:08Z

@sadra-barikbin there is a related to this PR failure on HVD GPU CI:

[0]<stderr>:User function raise error: Padding length should be less than or equal to two times the input dimension but got padding length 6 and input of dimension 1Traceback (most recent call last):
[0]<stderr>:  File "<frozen runpy>", line 198, in _run_module_as_main
[0]<stderr>:  File "<frozen runpy>", line 88, in _run_code
[0]<stderr>:  File "/opt/conda/lib/python3.11/site-packages/horovod-0.28.1-py3.11-linux-x86_64.egg/horovod/runner/run_task.py", line 37, in <module>
[0]<stderr>:    main(driver_addr, run_func_server_port)
[0]<stderr>:  File "/opt/conda/lib/python3.11/site-packages/horovod-0.28.1-py3.11-linux-x86_64.egg/horovod/runner/run_task.py", line 28, in main
[0]<stderr>:    raise e
[0]<stderr>:  File "/opt/conda/lib/python3.11/site-packages/horovod-0.28.1-py3.11-linux-x86_64.egg/horovod/runner/run_task.py", line 25, in main
[0]<stderr>:    ret_val = func()
[0]<stderr>:              ^^^^^^
[0]<stderr>:  File "/opt/conda/lib/python3.11/site-packages/horovod-0.28.1-py3.11-linux-x86_64.egg/horovod/runner/__init__.py", line 215, in wrapped_func
[0]<stderr>:    return func(*args, **kwargs)
[0]<stderr>:           ^^^^^^^^^^^^^^^^^^^^^
[0]<stderr>:  File "/work/tests/ignite/conftest.py", line 370, in _hvd_task_with_init
[0]<stderr>:    func(*args)
[0]<stderr>:  File "/work/tests/ignite/distributed/utils/__init__.py", line 333, in _test_idist_all_gather_tensors_with_shapes_group
[0]<stderr>:    tensors = all_gather_tensors_with_shapes(rank_tensor, [[r + 1, r + 2, r + 3] for r in ranks], ranks)
[0]<stderr>:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0]<stderr>:  File "/work/ignite/distributed/utils.py", line 395, in all_gather_tensors_with_shapes
[0]<stderr>:    padded_tensor = torch.nn.functional.pad(
[0]<stderr>:                    ^^^^^^^^^^^^^^^^^^^^^^^^
[0]<stderr>:  File "/opt/conda/lib/python3.11/site-packages/torch/nn/functional.py", line 4552, in pad
[0]<stderr>:    return torch._C._nn.pad(input, pad, mode, value)
[0]<stderr>:           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0]<stderr>:RuntimeError: Padding length should be less than or equal to two times the input dimension but got padding length 6 and input of dimension 1

https://github.com/pytorch/ignite/actions/runs/10769410690/job/29860560970?pr=3283

Can you please check what happens?

Implement the feature with tests

7c66841

github-actions bot added the module: distributed Distributed module label Sep 4, 2024

Remove comment

9cc00a7

vfdev-5 reviewed Sep 4, 2024

View reviewed changes

ignite/distributed/utils.py Outdated Show resolved Hide resolved

vfdev-5 changed the title ~~Helper function allgather_tensors_with_defferent_shapes()~~ Helper function all_gather_tensors_with_shapes() Sep 4, 2024

vfdev-5 reviewed Sep 4, 2024

View reviewed changes

ignite/distributed/utils.py Outdated Show resolved Hide resolved

vfdev-5 approved these changes Sep 4, 2024

View reviewed changes

Improve docstring and fig a bug in tests

0d8eb3b

vfdev-5 reviewed Sep 4, 2024

View reviewed changes

ignite/distributed/utils.py Outdated Show resolved Hide resolved

ignite/distributed/utils.py Outdated Show resolved Hide resolved

ignite/distributed/utils.py Show resolved Hide resolved

sadra-barikbin and others added 4 commits September 4, 2024 19:35

Fix tests

538f0c0

Update ignite/distributed/utils.py

7ac690a

Co-authored-by: vfdev <[email protected]>

Improve docstring

988ec06

Merge remote-tracking branch 'refs/remotes/upstream/feature-allgather…

9501489

…-tensors-with-different-shapes' into feature-allgather-tensors-with-different-shapes

vfdev-5 reviewed Sep 4, 2024

View reviewed changes

ignite/distributed/utils.py Outdated Show resolved Hide resolved

vfdev-5 reviewed Sep 4, 2024

View reviewed changes

ignite/distributed/utils.py Outdated Show resolved Hide resolved

ignite/distributed/utils.py Outdated Show resolved Hide resolved

Fix test and docstring

907dcc4

vfdev-5 reviewed Sep 5, 2024

View reviewed changes

vfdev-5 merged commit 680ac7f into pytorch:master Sep 5, 2024
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helper function `all_gather_tensors_with_shapes()` #3281

Helper function `all_gather_tensors_with_shapes()` #3281

sadra-barikbin commented Sep 4, 2024

vfdev-5 left a comment

vfdev-5 Sep 5, 2024

sadra-barikbin Sep 5, 2024

vfdev-5 Sep 5, 2024

vfdev-5 Sep 5, 2024

vfdev-5 commented Sep 9, 2024

Helper function all_gather_tensors_with_shapes() #3281

Helper function all_gather_tensors_with_shapes() #3281

Conversation

sadra-barikbin commented Sep 4, 2024

vfdev-5 left a comment

Choose a reason for hiding this comment

vfdev-5 Sep 5, 2024

Choose a reason for hiding this comment

sadra-barikbin Sep 5, 2024

Choose a reason for hiding this comment

vfdev-5 Sep 5, 2024

Choose a reason for hiding this comment

vfdev-5 Sep 5, 2024

Choose a reason for hiding this comment

vfdev-5 commented Sep 9, 2024

Helper function `all_gather_tensors_with_shapes()` #3281

Helper function `all_gather_tensors_with_shapes()` #3281