Skip to content

Draft: Add multi gpu support #2409

Draft: Add multi gpu support

Draft: Add multi gpu support #2409

Re-run triggered September 25, 2024 03:20
Status Failure
Total duration 20m 46s
Artifacts

ci.yml

on: pull_request
Fit to window
Zoom out
Zoom in

Annotations

5 errors and 1 warning
test: flair/__init__.py#L1
mypy-status mypy exited with status 1.
test: flair/distributed_utils.py#L1
Black format check --- /home/runner/work/flair/flair/flair/distributed_utils.py 2024-09-25 03:21:08.632884+00:00 +++ /home/runner/work/flair/flair/flair/distributed_utils.py 2024-09-25 03:23:55.529983+00:00 @@ -40,10 +40,11 @@ return True class DistributedModel(torch.nn.parallel.DistributedDataParallel): """DistributedDataParallel, but redirects access to methods and attributes to the original Model""" + def __getattr__(self, name): try: return super().__getattr__(name) except AttributeError: return getattr(self.module, name)
test: flair/distributed_utils.py#L341
ruff pytest_ruff.RuffError: flair/distributed_utils.py:14:5: D415 First line should end with a period, question mark, or exclamation point | 13 | def launch_distributed(fp, *args): 14 | """Executes the function fp(*args) on multiple GPUs (all local GPUs)""" | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ D415 15 | world_size = torch.cuda.device_count() 16 | log.info(f"Launching {world_size} distributed processes") | = help: Add closing punctuation flair/distributed_utils.py:36:5: D415 First line should end with a period, question mark, or exclamation point | 35 | def is_main_process() -> bool: 36 | """True for exactly 1 process, regardless of whether being run on CPU/single-GPU/multi-gpu""" | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ D415 37 | if flair.distributed: 38 | return flair.device.index == 0 | = help: Add closing punctuation flair/distributed_utils.py:44:5: D415 First line should end with a period, question mark, or exclamation point | 43 | class DistributedModel(torch.nn.parallel.DistributedDataParallel): 44 | """DistributedDataParallel, but redirects access to methods and attributes to the original Model""" | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ D415 45 | def __getattr__(self, name): 46 | try: | = help: Add closing punctuation
test: flair/trainers/trainer.py#L1
flair/trainers/trainer.py 426: error: Incompatible types in assignment (expression has type "DistributedModel", variable has type "Model[Any]") [assignment] 575: error: "Iterable[Any]" has no attribute "set_epoch" [attr-defined] 730: error: Incompatible types in assignment (expression has type "Tuple[float, Any]", variable has type "Tuple[()]") [assignment] 738: error: Incompatible types in assignment (expression has type "Tuple[float]", variable has type "Tuple[()]") [assignment]
test
Process completed with exit code 1.
test
The following actions use a deprecated Node.js version and will be forced to run on node20: actions/checkout@v3, actions/setup-python@v4, actions/cache@v3. For more info: https://github.blog/changelog/2024-03-07-github-actions-all-actions-will-run-on-node20-instead-of-node16-by-default/