Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pytorch unit test hangs #522

Merged
merged 6 commits into from
Jun 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 8 additions & 9 deletions .github/workflows/py-unittests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,26 +22,25 @@ jobs:
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
if: matrix.python-version != '3.10'
- name: Install dependencies (excluding experimental)
if: matrix.python-version == '3.7'
run: |
python -m pip install -U pip setuptools wheel
pip install -r ./api/python/cellxgene_census/scripts/requirements-dev.txt
pip install -e './api/python/cellxgene_census/'
- name: Install dependencies (with experimental)
if: matrix.python-version == '3.10'
- name: Install dependencies (including experimental)
if: matrix.python-version != '3.7'
run: |
python -m pip install -U pip setuptools wheel
pip install -r ./api/python/cellxgene_census/scripts/requirements-dev.txt
pip install -e './api/python/cellxgene_census/[experimental]'
- name: Test with pytest (API)
if: matrix.python-version != '3.10'
- name: Test with pytest (API, main tests)
run: |
PYTHONPATH=. coverage run --parallel-mode -m pytest -v -rP --durations=20 ./api/python/cellxgene_census/tests/
- name: Test with pytest (API, with experimental)
if: matrix.python-version == '3.10'
- name: Test with pytest (API, experimental)
if: matrix.python-version != '3.7'
run: |
PYTHONPATH=. coverage run --parallel-mode -m pytest -v -rP --durations=20 --experimental ./api/python/cellxgene_census/tests/
PYTHONPATH=. coverage run --parallel-mode -m pytest -v -rP --durations=20 --experimental ./api/python/cellxgene_census/tests/experimental
- uses: actions/upload-artifact@v3
if: matrix.os == 'ubuntu-latest'
with:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -474,6 +474,9 @@ def experiment_dataloader(
if set(unsupported_dataloader_args).intersection(dataloader_kwargs.keys()):
raise ValueError(f"The {','.join(unsupported_dataloader_args)} DataLoader params are not supported")

if num_workers > 0:
_init_multiprocessing()

return DataLoader(
datapipe,
batch_size=None, # batching is handled by our ExperimentDataPipe
Expand All @@ -484,6 +487,25 @@ def experiment_dataloader(
)


def _init_multiprocessing() -> None:
"""Ensures use of "spawn" for starting child processes with multiprocessing.
Forked processes are known to be problematic:
https://pytorch.org/docs/stable/notes/multiprocessing.html#avoiding-and-fighting-deadlocks
Also, CUDA does not support forked child processes:
https://pytorch.org/docs/stable/notes/multiprocessing.html#cuda-in-multiprocessing

"""
torch.multiprocessing.set_start_method("fork", force=True)
orig_start_method = torch.multiprocessing.get_start_method()
if orig_start_method != "spawn":
if orig_start_method:
pytorch_logger.warning(
"switching torch multiprocessing start method from "
f'"{torch.multiprocessing.get_start_method()}" to "spawn"'
)
torch.multiprocessing.set_start_method("spawn", force=True)


# For testing only
if __name__ == "__main__":
import tiledbsoma as soma
Expand Down