warning if current device index is lower than current local rank #1335

HelioStrike · 2020-10-02T03:19:35Z

Fixes #1308

Description:

Display a warning if current device index is lower than local rank. (Show a warning if current device index is lower than current local rank #1308 )

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

ignite/distributed/comp_models/horovod.py

ignite/distributed/comp_models/native.py

sdesrozis · 2020-10-02T06:17:07Z

@HelioStrike Here some comments.

Could you update the documentation with a note about this feature please ? Thank you again !!

vfdev-5 · 2020-10-02T07:58:22Z

@HelioStrike thanks for the PR. I'd say we have to add tests also for that...

HelioStrike · 2020-10-02T10:27:01Z

@sdesrozis @vfdev-5 Sorry for asking such a silly question but, what should the tests be checking for, and what should it do? I had a look at test_horovod.py and test_native.py, but I'm still a little confused. :|

vfdev-5 · 2020-10-02T12:15:11Z

@HelioStrike it is good question (not that silly) !

I'd say we have to reproduce initial issue description and test that. Unfortunately, it is not explicitly said in the linked issues.
The problem is here :

import torch
import torch.distributed as dist

def main():

    # !!! We do not call torch.cuda.set_device("cuda:lrank")

    dist.init_process_group(backend="nccl", init_method="env://")
    import os
    local_rank = int(os.environ["LOCAL_RANK"])

    # if we call idist.get_world_size() the first time
    # it will try to setup Native dist comp model
    # and will call all reduce to get some parameter like number of nodes
    # effectively, it does below code:
    # tensor = torch.tensor([local_rank + 1]).to("cuda")
    # dist.all_reduce(tensor, op=dist.ReduceOp.MAX)
    # and this is incorrect as reductions is done on the only one device instead of all participating devices
      
    idist.get_world_size()

    dist.destroy_process_group()

if __name__ == "__main__":
    main()

So, the idea is to check the warning with pytest.warns that idist.get_world_size() raises a warning.
Let me know if you need more details.

HelioStrike · 2020-10-03T08:08:05Z

@vfdev-5 Thanks, got most of the info I needed!

I added the tests, but the code hangs, so I commented it out for now. I think the ideal behavior for now is for pytest to return true as soon as it sees the warning, instead of executing further. Would you happen to know how this issue can be fixed?

vfdev-5 · 2020-10-03T22:13:51Z

@HelioStrike thanks for the update!

I added the tests, but the code hangs, so I commented it out for now.

Let me give you some explanations of what happens in your test :

    os.environ["RANK"] = "1"
    os.environ["WORLD_SIZE"] = "1"
    os.environ["MASTER_ADDR"] = "0.0.0.0"
    os.environ["MASTER_PORT"] = "2222"
    dist.init_process_group(backend="nccl", init_method="env://")

In this case it is indicated that total number of participating devices/processes (=world size) is 1. It means that distributed communication group will have processes indexed from 0 up to 1 non-included to communicate with.
And current process (=rank) is 1 instead of 0, so dist group waits for rank 0 to appear and thus hangs.

What we would like to check is the following:

backend: nccl => GPUs
world size > 1
user manually creates dist group as above: warning if current device index is lower than current local rank #1335 (comment)

The test setup should be something like here :

ignite/tests/ignite/distributed/comp_models/test_native.py

Lines 252 to 254 in d384dc6

    
           @pytest.mark.distributed 
        
           @pytest.mark.skipif(torch.cuda.device_count() < 1, reason="Skip if no GPU") 
        
           def test__native_dist_model_create_dist_nccl(local_rank, world_size):

and @pytest.mark.skipif(torch.cuda.device_count() < 2, reason="Skip if less than 1 GPU")

Please, let me know if you have no access to more than 1 GPU. Anyway, here in PRs our CI does not run tests on GPUs but will run once merged to the repository master or any other branch.

HelioStrike · 2020-10-04T02:39:54Z

@vfdev-5 Thank you for the neat explanation! I'm new to the repo but I think I have a better understanding of things after reading it. I made the suggested changes but I don't have access to 2 GPUs to test it. :(

vfdev-5

Thanks for the update

tests/ignite/distributed/comp_models/test_native.py

HelioStrike · 2020-10-05T02:34:18Z

@vfdev-5 Updated! Only issue is that the local_rank fixture only sets up the LOCAL_RANK environment variable, and I still get errors saying that RANK is missing. Made all the other changes.

vfdev-5 · 2020-10-05T08:42:51Z

Only issue is that the local_rank fixture only sets up the LOCAL_RANK environment variable, and I still get errors saying that RANK is missing

@HelioStrike yes, that's true. Actually I want to make the test like here

ignite/tests/ignite/distributed/comp_models/test_native.py

Lines 252 to 254 in d384dc6

    
           @pytest.mark.distributed 
        
           @pytest.mark.skipif(torch.cuda.device_count() < 1, reason="Skip if no GPU") 
        
           def test__native_dist_model_create_dist_nccl(local_rank, world_size):

which calls

ignite/tests/ignite/distributed/comp_models/test_native.py

Line 208 in d384dc6

    
           def _test__native_dist_model_create_from_context_dist(local_rank, rank, world_size, true_backend, true_device):

The problem with the way you configure ddp group is that the rank is always 1 but it has to be 0 and 1 for different processes.

HelioStrike · 2020-10-05T10:08:49Z

@vfdev-5 I see now. I updated the code, but is the true_device argument correct (to trigger this specific condition)? Also, like _test__native_dist_model_create_from_context_dist, are there any other cases we'd want to test?

vfdev-5 · 2020-10-05T10:20:58Z

@HelioStrike thanks for working on that. Sorry, I think I badly explained how we would like to test the warning.
Previous links show the example of how to setup distributed configuration and test other things:

test__native_dist_model_create_dist_nccl : executes tests
_test__native_dist_model_create_from_context_dist : a subtest inside test__native_dist_model_create_dist_nccl

We would like to write another executable test where we setup distributed conf like above, but we test our particular thing : if idist.get_world_size() raises a warning.

If you'd like to understand all that in detail, you can try to run the test in Colab on a single GPU. It may be a bit tricky to run the test with 2 processes, but you can get a better understanding of how ddp works from user's perspective.
If it is too complicated for you, I can update the test myself later...

HelioStrike · 2020-10-05T10:24:40Z

@vfdev-5 I see.. I think I got it. I noticed that eventually _test__native_dist_model_create_from_context_dist calls .get_world_size(), so I put that in directly, but yes, I should probably create a new subtest.

vfdev-5 · 2020-10-05T10:29:38Z

@HelioStrike that's true, _test__native_dist_model_create_from_context_dist also calls get world size and probably raises warning too. So, we have to fix that test by adding torch.cuda.set_device("cuda:lrank"). Anyway, I have to check that from my side on 2 GPUs.

HelioStrike · 2020-10-05T10:44:42Z

@vfdev-5 Sorry about this, I think I lack the knowledge to be able to work with this issue properly...

vfdev-5 · 2020-10-05T10:51:06Z

@HelioStrike no problems, it is a bit a tough subject and requires specific infrastructure to check. I'll finalize this PR later.
Feel free to pick another hactoberfest issue which normally require less particular knowledge etc...

…warning

- fixed failing test

sdesrozis · 2020-10-08T06:46:43Z

ignite/distributed/comp_models/native.py

@@ -97,7 +97,13 @@ def _init_from_context(self) -> None:
            self._setup_attrs()

        def _compute_nproc_per_node(self) -> int:
-            tensor = torch.tensor([self.get_local_rank() + 1]).to(self.device())
+            print("_compute_nproc_per_node")


debug print to remove ?

sdesrozis · 2020-10-08T06:47:18Z

Added a comment. Sorry for the delay...

…) (#1376) * warning if current device index is lower than current local rank (#1335) * warning if current device index is lower than current local rank * Updated code and tests * Fixed formatting * Updated code and tests for horovod - fixed failing test * Updated tests Co-authored-by: vfdev-5 <[email protected]> * Removed debug prints * Fixed failing hvd tests Co-authored-by: Sai Sandeep Mutyala <[email protected]>

* Update TQDM to > 4.48.0 (pytorch#1339) * Fix tqdm 4.49.0 tests. * update requirements. * Use distutils instead of packaging for getting package version. * Reformat code. * Reformat using black 19.10b0. Co-authored-by: vfdev <[email protected]> * Activate mypy in ignite.utils (pytorch#1352) * Activate mypy in ignite.utils * Add missing torch package to lint stage * Move mypy check to test stage Co-authored-by: vfdev <[email protected]> * Exposed EventEnum in docs and added import statement in its examples (pytorch#1345) (pytorch#1353) * Update Shpinx to v3.1. (pytorch#1356) Fixes pytorch#1272 * Update README.md * Update doc of examples for Trains fileserver setup (pytorch#1360) * update doc for trains fileserver setup * replace github issue by documentation Co-authored-by: Desroziers <[email protected]> * Updated commit fix (pytorch#1361) (pytorch#1364) * Updated commit fix * Update code-style.yml Co-authored-by: rex_divakar <[email protected]> * added tuple type to mixins.py (pytorch#1365) * added tuple type to mixins.py * allow mypy to pass through base file * fixed linting issues Co-authored-by: vfdev <[email protected]> * Activate mypy in ignite.distributed (pytorch#1355) * Activate mypy in ignite.distributed * Fix tests & py3.5 inline type hints * Remove typing,overload * Fix multiple typing issues * Fix typing issues * Fix TPU test Co-authored-by: vfdev <[email protected]> * Improve typing for ignite.handlers module (1343) (pytorch#1349) * Improve typing for ignite.handlers module (1343) * autopep8 fix * Fix typing for py35, remove handlers block from mypy.ini * Add exception to ModelCheckpoint when saving last checkpoint * Add test for ModelCheckpoint with redefined save_handler case * autopep8 fix Co-authored-by: AutoPEP8 <> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> Co-authored-by: trsvchn <[email protected]> * [3] [contrib/metrics] setup typing in contrib part of the library (pytorch#1363) * [3] [contrib/metrics] setup typing in contrib part of the library * review changes * Update gpu_info.py Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * [2] [contrib/handlers] setup typing in contrib part of the library (pytorch#1362) * [2] [contrib/handlers] setup typing in contrib part of the library * Fix a typo in tqdm logger * review comments * Update mlflow_logger.py * Update neptune_logger.py * review changes * review changes Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * [1] [contrib/engines] setup typing in contrib part of the library (pytorch#1351) * setup typing for contribute/engine part of the code * revert doc string changes * Update common.py Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * Update PULL_REQUEST_TEMPLATE.md * Disable cross-ref links for type annotations (pytorch#1374) * Added reinit__is_reduced and sync_all_reduce docs in metrics doc (pytorch#1373) * added links to reinit__is_reduced and sync_all_reduce decorators in metrics documentation * updated order in list of metrics * deleted decorators from metric list * Update metrics.rst Co-authored-by: vfdev <[email protected]> * warning if current device index is lower than current local rank (pytorch#1335) (pytorch#1376) * warning if current device index is lower than current local rank (pytorch#1335) * warning if current device index is lower than current local rank * Updated code and tests * Fixed formatting * Updated code and tests for horovod - fixed failing test * Updated tests Co-authored-by: vfdev-5 <[email protected]> * Removed debug prints * Fixed failing hvd tests Co-authored-by: Sai Sandeep Mutyala <[email protected]> * Github Actions workflow CI for horovod on CPU (pytorch#1377) * Initial commit * Update hvd-tests.yml * Update hvd-tests.yml * Update hvd-tests.yml * trigger GitHub actions * removed examples * trigger GitHub actions * Improve typing of distributed.comp_modules.utils.all_gather (pytorch#1370) * Improve typing of distributed.comp_modules.utils.all_gather * Fix all_gather gloo test * Fix XLA test Co-authored-by: vfdev <[email protected]> * Removed state.restart method (pytorch#1385) * Activate mypy in ignite.engine (pytorch#1379) * Activate mypy in ignite.engine * Fix missing import * Fix typing issues with nighty build * Fix PR findings Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * add acknowledgment for IFPEN (pytorch#1387) Co-authored-by: Desroziers <[email protected]> * Fix collections DeprecationWarning (pytorch#1388) * Remove deprecated CustomPeriodicEvent from nb example, fix tb OutputHadler (pytorch#1389) * Update checkpoint.py (pytorch#1394) * Add GH Action to build and publish Docker images (pytorch#1390) * ADD GH Action to build and publish Docker images * Configure GH action for Docker build * Fix image_tag fetching * Fix identation for main steps * Add token envs for GH docker action, fix push all tag_image * Switch to horovod 0.20.3 * Push images on push events * Fix if conditional * Toctrees for classes and methods using sphinx autosummary (pytorch#1393) * Implement autosummary patch for autolisting * Fix css for autogenerated tables via autosummary * Improve autolist feature * Add toctrees for methods and classes for ignite * Better import for autolist * Add toctrees for methods and classes for contrib package * Fix CSS for autosummary table row height * Fix warnings raised by toctree * Remove all deprecated args, kwargs for v0.5.0 (pytorch#1396) (pytorch#1397) * Remove all deprecated args, kwargs for v0.5.0 (pytorch#1396) * Improve deprecation message of setup_any_logging (pytorch#1396) * Update setup.cfg * Remove module members w/o docstrings from autolist toc tables (pytorch#1401) * Add --color=yes in pytest config (pytorch#1402) * removes styling of function descriptions as requested in pytorch#1256 (pytorch#1399) * removes styling of function descriptions as requested in pytorch#1256 * reverts modifications to the example files * Skip circle ci and GA builds (pytorch#1400) * [WIP] Skip circle ci and GA builds * Fixes swissknife version * Replaced swissknife by sh script * [skip ci] Updated trigger_if_modified.sh script - excluded docs from github actions - added 1.6.0 to pytorch-version-tests * Fixes pytorch#1408, XLA failing tests (pytorch#1412) - Issue is related to xla nightly - Probably related to pytorch/xla#2576 * Added Mypy check as github action (pytorch#1418) * Added Mypy check to as github action * Removed py3.5 * Activate mypy in ignite.metrics (pytorch#1391) * Activate mypy in ignite.metrics * remove num_examples check * fix CR issues * remove init defaults in Accuracy * Remove None assignments in __init__ * improved typing connected with mypy issues Co-authored-by: vfdev <[email protected]> * Update README.md Badge travis org -> com * add tolerance for tpu in r2 and canberra tests (pytorch#1414) * add tolerance for tpu * autopep8 fix * Reverted test_canberra_metric.py as unnecessary * Update test_r2_score.py * Update test_r2_score.py Co-authored-by: Desroziers <[email protected]> Co-authored-by: sdesrozis <[email protected]> Co-authored-by: vfdev <[email protected]> * Activate mypy ignite contrib metrics (pytorch#1419) * Activate mypy in ignite.contrib.metrics * Add missing type hints in ignite.contrib.metrics * Add missing type hints * Contributing guide (pytorch#1424) * Add materials on how to setup dev env in CONTRIBUTING guide pytorch#1395 - first draft of first-time contributor guide * Add materials on how to setup dev env in CONTRIBUTING guide pytorch#1395 - add in Table of Contents * Add materials on how to setup dev env in CONTRIBUTING guide pytorch#1395 - fix table of contents link * Add materials on how to setup dev env in CONTRIBUTING guide pytorch#1395 - rollback README.md, remove IDE setting * Update CONTRIBUTING.md Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> * replace Number type with float; remove unneeded type ignores (pytorch#1425) Co-authored-by: vfdev <[email protected]> * Update README.md * Added new time profiler `HandlersTimeProfiler` which allows per handler time profiling (pytorch#1398) * added new HandlersTimeProfiler with handler level details and added tests for HandlersTimeProfiler (pytorch#1346) * updated docs and docstring for HandlersTimeProfiler (pytorch#1346) * updated HandlersTimeProfiler to support any events and updated detach mechanism of profiler (pytorch#1346) * updated HandlersTimeProfiler with code improvements and implemented csv export method (pytorch#1346) * updated HandlersTimeProfiler to handle event handler bundle better (pytorch#1346) * HandlersTimeProfiler: added threshold for filtering profiled time for handlers attached to event with filters (pytorch#1346) * HandlersTimeProfiler: add tests and type hints (pytorch#1398) * HandlersTimeProfiler: use FloatTensor for list to tensor conversion (pytorch#1398) * HandlersTimeProfiler: use torch.tensor for list to tensor conversion (pytorch#1398) * HandlersTimeProfiler: remove unnecessary import (pytorch#1398) * HandlersTimeProfiler: move tensor conversion in compute_basic_stats (pytorch#1398) Co-authored-by: vfdev <[email protected]> * Fix HandlersTimeProfiler example rendering (pytorch#1428) * Fix HandlersTimeProfiler example rendering * Fix WARNINGs: Title underline too short * Add horizontal scrollbars for examples instead of font-size tweaks * Enable horizontal scrollbars for examples globally * Save model with same filename (pytorch#1423) * save model with same filename * Update checkpoint.py * use elif * refactor to have only one comprehension list * refactoring * improve test * autopep8 fix Co-authored-by: Desroziers <[email protected]> Co-authored-by: vfdev <[email protected]> Co-authored-by: sdesrozis <[email protected]> * Some docs nit picking (pytorch#1435) * enable extra flags for stricter type checking (pytorch#1433) * Fixes pytorch#1426 - distrib cpu tests on win (pytorch#1429) * Fixes pytorch#1426 - distrib cpu tests on win * Skip distributed/tpu/multinode_distributed if SKIP_DISTRIB_TESTS=1 * replaced sh by bash * Update pytorch-version-tests.yml * Updated files to ignore for GitHub Actions CI (pytorch#1441) * Added windows gpu test to circle ci (pytorch#1440) * [WIP] Added windows gpu test to circle ci * Updated windows config * Updated conda installation * Updated miniconda install command * Removed conda installation * Updated configuration * Adding max_iters as an optional arg in Engine run (pytorch#1381) * initial draft, adding max_iters as optional args in run * fixed typo * minor bug fixes * resolving failing tests * fixed out-of-place conditional * typo fix * updated docstring for 'run' * added initial tests * (WIP) restructured creating a new state with max_iters * updated tests & docstrings * initial draft, adding max_iters as optional args in run * fixed typo * minor bug fixes * resolving failing tests * fixed out-of-place conditional * typo fix * updated docstring for 'run' * added initial tests * (WIP) restructured creating a new state with max_iters * updated tests & docstrings * added test to check _is_done * updating engine loop condition * autopep8 fix * linting issues * fixed mypy errors * fixed formatting * minor fix & add test for larger max_iters * removed unused typechecking Co-authored-by: thescripted <[email protected]> Co-authored-by: vfdev <[email protected]> * Updated circleci trigger_if_modified.sh if on master * Fix failing distributed ci (pytorch#1445) * Setup random free port for distrib ci * autopep8 fix Co-authored-by: vfdev-5 <[email protected]> * Added torch.cuda.manual_seed_all(seed) (pytorch#1444) * fix ReduceOp typing issue (pytorch#1447) * Update README.md * Updated miniconda setup (pytorch#1449) * Fixed broken coverage (pytorch#1451) * Fixed broken coverage * Updated hvd-tests.yml * Migrated nightly build/release to GitHub Actions (pytorch#1448) (pytorch#1450) * Added nightly build/release action * Updated yml * Updated to conda-incubator/setup-miniconda@v2 * Reverted modifications in other actions * Migrated nightly build/release to GitHub Actions (pytorch#1448) * Added nightly build/release action * Updated yml * Updated to conda-incubator/setup-miniconda@v2 * Reverted modifications in other actions * Fix PyPi upload * Finalized binaries-nightly-release.yml * Updated README * [contributing] add syncing up with the upstream (pytorch#1452) * Update setup.cfg * [contributing] add syncing up with the upstream * Apply suggestions from code review Co-authored-by: vfdev <[email protected]> * Update CONTRIBUTING.md Co-authored-by: vfdev <[email protected]> * [ci] create docs.yml * install torch * Activate MyPy in ignite.contrib.engines (pytorch#1416) * Activate mypy in ignite.contrib.engines * Fix review comments * fix extra event too * Update to fix strict errors * Update quickstart.rst (pytorch#1460) * Update quickstart.rst Plz have a look if I am going correct or not. ###rewording sentences to simplify the understanding * Update docs/source/quickstart.rst Co-authored-by: vfdev <[email protected]> * Update quickstart.rst * Update quickstart.rst * Update docs/source/quickstart.rst Co-authored-by: vfdev <[email protected]> * Update quickstart.rst Final commit is done. You can review it. Co-authored-by: vfdev <[email protected]> * [docs] intersphinx update in conf.py * [docs] add missing function in handlers docs (pytorch#1463) * Update setup.cfg * [docs] missing function in handlers docs * [docs] add ignite favicon (pytorch#1462) * Update setup.cfg * [docs] add ignite favicon * Add missing classes and links for docs (pytorch#1461) * Update CONTRIBUTING.md * Update concepts.rst (pytorch#1465) * setup toml, yaml, prettier in pre-commit (pytorch#1468) * Update setup.cfg * [pre-commit] setup yaml in pre-commit hook * [pre-commit] setup toml prettier * [docs] make GIF look good on mobile (pytorch#1470) * Update setup.cfg * [docs] make gif fit on mobile * Update index.rst * use .. raw:: html * Update index.rst Co-authored-by: vfdev <[email protected]> * Update README.md * Update README.md * Update README.md * Update README.md * [docs] add submodule in engine.rst (pytorch#1464) * Update setup.cfg * [docs] add submodule in engine * [docs] add suggestions, contrib engine docs and 45% width * Update ignite_theme.css * [metrics] speed up SSIM tests (pytorch#1467) * Update setup.cfg * [metrics] update ssim * use np.allclose instead of torch.allclose * Apply suggestions from code review * extract into _test_ssim * extract into scripts * fix path * fix path * fix path * good to go! * [ci] universal conda build (pytorch#1471) * Update setup.cfg * rm conda_build_config * rm conda_build_config * Update docs.yml * Update install_docs_deps.sh Co-authored-by: vcarpani <[email protected]> Co-authored-by: Anton Grübel <[email protected]> Co-authored-by: Harsh Patel <[email protected]> Co-authored-by: Théo Dumont <[email protected]> Co-authored-by: Sylvain Desroziers <[email protected]> Co-authored-by: Desroziers <[email protected]> Co-authored-by: rex_divakar <[email protected]> Co-authored-by: Benjamin Kinga <[email protected]> Co-authored-by: Taras Savchyn <[email protected]> Co-authored-by: trsvchn <[email protected]> Co-authored-by: RaviTeja Pothana <[email protected]> Co-authored-by: Josseline Perdomo <[email protected]> Co-authored-by: Sai Sandeep Mutyala <[email protected]> Co-authored-by: Ramesht Shukla <[email protected]> Co-authored-by: botbotbot <[email protected]> Co-authored-by: Jeff Yang <[email protected]> Co-authored-by: Sergey Epifanov <[email protected]> Co-authored-by: sdesrozis <[email protected]> Co-authored-by: zhxxn <[email protected]> Co-authored-by: François COKELAER <[email protected]> Co-authored-by: thescripted <[email protected]> Co-authored-by: vfdev-5 <[email protected]> Co-authored-by: Rostyslav Zatserkovnyi <[email protected]> Co-authored-by: Afzal <[email protected]>

HelioStrike mentioned this pull request Oct 2, 2020

Show a warning if current device index is lower than current local rank #1308

Closed

sdesrozis reviewed Oct 2, 2020

View reviewed changes

ignite/distributed/comp_models/horovod.py Show resolved Hide resolved

sdesrozis reviewed Oct 2, 2020

View reviewed changes

ignite/distributed/comp_models/native.py Show resolved Hide resolved

HelioStrike force-pushed the device-index-warning branch from 2f0428b to 16d021d Compare October 3, 2020 08:00

HelioStrike force-pushed the device-index-warning branch from 16d021d to 6ac5b17 Compare October 4, 2020 02:34

vfdev-5 reviewed Oct 4, 2020

View reviewed changes

tests/ignite/distributed/comp_models/test_native.py Outdated Show resolved Hide resolved

tests/ignite/distributed/comp_models/test_native.py Outdated Show resolved Hide resolved

tests/ignite/distributed/comp_models/test_native.py Outdated Show resolved Hide resolved

HelioStrike force-pushed the device-index-warning branch 2 times, most recently from eaac5ea to 072c468 Compare October 5, 2020 02:31

warning if current device index is lower than current local rank

726041a

HelioStrike force-pushed the device-index-warning branch from 072c468 to 726041a Compare October 5, 2020 10:06

vfdev-5 added 3 commits October 7, 2020 20:44

Merge branch 'master' of github.com:pytorch/ignite into device-index-…

5677f45

…warning

Updated code and tests

029197d

Merge branch 'master' into device-index-warning

c98ac5b

vfdev-5 changed the base branch from master to device-index-warning October 7, 2020 21:37

Fixed formatting

d605e70

vfdev-5 added 2 commits October 7, 2020 22:07

Updated code and tests for horovod

354dfca

- fixed failing test

Updated tests

c3a2db5

vfdev-5 merged commit 42c0cf0 into pytorch:device-index-warning Oct 7, 2020

vfdev-5 mentioned this pull request Oct 7, 2020

warning if current device index is lower than current local rank (#1335) #1376

Merged

3 tasks

sdesrozis reviewed Oct 8, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

warning if current device index is lower than current local rank #1335

warning if current device index is lower than current local rank #1335

HelioStrike commented Oct 2, 2020 •

edited by sdesrozis

Loading

sdesrozis commented Oct 2, 2020

vfdev-5 commented Oct 2, 2020

HelioStrike commented Oct 2, 2020

vfdev-5 commented Oct 2, 2020 •

edited

Loading

HelioStrike commented Oct 3, 2020 •

edited

Loading

vfdev-5 commented Oct 3, 2020 •

edited

Loading

HelioStrike commented Oct 4, 2020

vfdev-5 left a comment

HelioStrike commented Oct 5, 2020

vfdev-5 commented Oct 5, 2020

HelioStrike commented Oct 5, 2020 •

edited

Loading

vfdev-5 commented Oct 5, 2020

HelioStrike commented Oct 5, 2020

vfdev-5 commented Oct 5, 2020

HelioStrike commented Oct 5, 2020 •

edited

Loading

vfdev-5 commented Oct 5, 2020

sdesrozis Oct 8, 2020

vfdev-5 Oct 8, 2020

sdesrozis commented Oct 8, 2020

warning if current device index is lower than current local rank #1335

warning if current device index is lower than current local rank #1335

Conversation

HelioStrike commented Oct 2, 2020 • edited by sdesrozis Loading

sdesrozis commented Oct 2, 2020

vfdev-5 commented Oct 2, 2020

HelioStrike commented Oct 2, 2020

vfdev-5 commented Oct 2, 2020 • edited Loading

HelioStrike commented Oct 3, 2020 • edited Loading

vfdev-5 commented Oct 3, 2020 • edited Loading

HelioStrike commented Oct 4, 2020

vfdev-5 left a comment

Choose a reason for hiding this comment

HelioStrike commented Oct 5, 2020

vfdev-5 commented Oct 5, 2020

HelioStrike commented Oct 5, 2020 • edited Loading

vfdev-5 commented Oct 5, 2020

HelioStrike commented Oct 5, 2020

vfdev-5 commented Oct 5, 2020

HelioStrike commented Oct 5, 2020 • edited Loading

vfdev-5 commented Oct 5, 2020

sdesrozis Oct 8, 2020

Choose a reason for hiding this comment

vfdev-5 Oct 8, 2020

Choose a reason for hiding this comment

sdesrozis commented Oct 8, 2020

HelioStrike commented Oct 2, 2020 •

edited by sdesrozis

Loading

vfdev-5 commented Oct 2, 2020 •

edited

Loading

HelioStrike commented Oct 3, 2020 •

edited

Loading

vfdev-5 commented Oct 3, 2020 •

edited

Loading

HelioStrike commented Oct 5, 2020 •

edited

Loading

HelioStrike commented Oct 5, 2020 •

edited

Loading