update to run_sorter_jobs() and slurm #3105

MarinManuel · 2024-06-29T20:03:17Z

I am trying to run spikeinterface sorters on a HPC cluster and needed to pass specific parameters to slurm/sbatch.

I added the option to pass arbitrary arguments to sbatch in the engine_kwargs.

Let me know if that's causing any issues

…er_jobs`

# Conflicts: # src/spikeinterface/sorters/launcher.py

alejoe91 · 2024-06-30T07:21:00Z

@MarinManuel thank you! This is great!

The SLURM deployment has been unexplored.on our side, so we are very glad that it works for you.

Would you mind sharing the script that you are using? I'd love to add an How To page on how to deploy multiple spike sorting jobs on SLURM

zm711

@alejoe91 / @samuelgarcia know the launcher stuff better, is popping the best strategy or could you just pull out the kwargs without changing the dict as you go?

Otherwise, I just added some fixes for our docstring style :)

src/spikeinterface/sorters/launcher.py

JoeZiminski

This is very cool! really opens up the SLURM options here. I have a couple of comments now, and will try it out on our HPC ASAP!

src/spikeinterface/sorters/launcher.py

…irectly into engine_kwargs

for more information, see https://pre-commit.ci

# Conflicts: # src/spikeinterface/sorters/launcher.py

for more information, see https://pre-commit.ci

src/spikeinterface/sorters/launcher.py

JoeZiminski · 2024-07-29T14:14:29Z

Hey @MarinManuel sorry for the delay in testing this locally, I tried it on our cluster and seems to be working well! I had a couple of other comments below, it might also be worth adding a test for this. I have pasted one in the dropdown below, I don't think I can push to your fork, but feel free to add it to src/spikeinterface/sorters/tests/test_launcher.py underneath test_run_sorter_jobs_slurm.

If you think of any functionality this is missing, please let me know / feel free to expand the test. This will also require adding pytest-mock to the dev dependencies in pyproject.toml in the main repo folder. It would be cool to also test the generated .py files sometime, but that's outside the scope of this PR.

The test

def test_run_sorter_jobs_slurm_kwargs(mocker, tmp_path, job_list):
    """
    Mock `subprocess.run()` to check that engine_kwargs are
    propagated to the call as expected.
    """
    # First, mock `subprocess.run()`, set up a call to `run_sorter_jobs`
    # then check the mocked `subprocess.run()` was called with the
    # expected signature. Two jobs are passed in `jobs_list`, first
    # check the most recent call.
    mock_subprocess_run = mocker.patch(
        "spikeinterface.sorters.launcher.subprocess.run"
    )

    tmp_script_folder = tmp_path / "slurm_scripts"

    engine_kwargs = engine_kwargs=dict(
        tmp_script_folder=tmp_script_folder,
        sbatch_args={
            "cpus-per-task": 32,
            "mem": "32G",
            "gres": "gpu:1",
            "any_random_kwarg": 12322,
        }
    )
    run_sorter_jobs(
        job_list,
        engine="slurm",
        engine_kwargs=engine_kwargs,
    )

    script_0_path = f"{tmp_script_folder}/si_script_0.py"
    script_1_path = f"{tmp_script_folder}/si_script_1.py"

    expected_command = [
        "sbatch", "--cpus-per-task", "32", "--mem", "32G", "--gres", "gpu:1", "--any_random_kwarg", "12322", script_1_path
    ]
    mock_subprocess_run.assert_called_with(expected_command, capture_output=True, text=True)


    # Next, check the fisrt call (which sets up `si_script_0.py`)
    # also has the expected arguments.
    expected_command[9] = script_0_path
    assert mock_subprocess_run.call_args_list[0].args[0] == expected_command

    # Next, check that defaults are used properly when no kwargs are
    # passed. This will default to `_default_engine_kwargs` as
    # set in `launcher.py`
    run_sorter_jobs(
        job_list,
        engine="slurm",
        engine_kwargs={"tmp_script_folder": tmp_script_folder},
    )
    expected_command = [
        "sbatch", "--cpus-per-task", "1", "--mem", "1G", script_1_path
    ]
    mock_subprocess_run.assert_called_with(expected_command, capture_output=True, text=True)

    # Finally, check that the `tmp_script_folder` is generated on the
    # fly as expected. A random foldername is generated, just check that
    # the folder to which the scripts are saved is in the `tempfile` format.
    run_sorter_jobs(
        job_list,
        engine="slurm",
        engine_kwargs=None,  # TODO: test defaults
    )
    tmp_script_folder = "_".join(tempfile.mkdtemp(prefix="spikeinterface_slurm_").split("_")[:-1])
    assert tmp_script_folder in mock_subprocess_run.call_args_list[-1].args[0][5]

Some other small things:

I think that the other, currently existing test (test_run_sorter_jobs_slurm) might need to be updated to the new way of passing the sbatch_args.
I think it is worth adding an error if any other key-value pair exists in engine_kwargs except for tmp_script_folder or sbatch_args. The reason is this PR is backward-incompatible with previous versions but will not error, only silently fail to pass the kwargs onwards to slurm (e.g. if cpus_per_task is passed to engine_kwargs directly as it was previously, it will just be ignored). So good to raise an error here to let the user know about the new API.
Also, might as well also assert / raise an error if that engine_kwargs is only not None when engine="slurm"/

for more information, see https://pre-commit.ci

MarinManuel · 2024-08-16T01:22:33Z

Hi @JoeZiminski
I tried to integrate your comments to my PR. Hopefully I did it right.
The only thing I did not do was

Also, might as well also assert / raise an error if that engine_kwargs is only not None when engine="slurm"/

I wasn't entirely sure what you meant, but I think sbatch would launch with default parameters if engine_kwargs=None, so that shouldn't be a problem?
Also, I didn't know how to write a test to check that the exception was raised correctly.

for more information, see https://pre-commit.ci

src/spikeinterface/sorters/tests/test_launcher.py

JoeZiminski · 2024-09-10T10:16:31Z

Hey @MarinManuel, so sorry for the delay with this! I'm around again now, please feel free to ping me and we can merge this ASAP.

Just a couple of small things I think:

please see a couple of small comments on the tests
there are a few github suggestions (e.g. from Zach) about [edit: above]. These can be commited by selected 'commit suggestion', changing the commit message (e.g. just 'docstring fix' or something is okay) and then 'commit changes'. It's a cool github feature!
Any other comments that are outdated or have been addressed can be resolved.

Also, might as well also assert / raise an error if that engine_kwargs is only not None when engine="slurm"

Sorry this was the other way round, in case a user who is unfamiliar with SLURM e.g. using 'dask' say but passes the engine_kwargs under some false impression these will do something for all engines. Alternatively we could rename engine_kwargs to slurm_engine_kwargs as they only relate to slurm. Actually this is better, maybe they can be called slurm_engine_kwargs or just slurm_kwargs?

After this I can do one last pass and test locally again then I think it's good to go! Apologies the review took so long, this new feature is a super-useful addition.

Co-authored-by: Zach McKenzie <[email protected]>

Co-authored-by: Joe Ziminski <[email protected]>

for more information, see https://pre-commit.ci

MarinManuel · 2024-09-11T18:14:55Z

Hi @JoeZiminski
Please let me know how that looks

JoeZiminski · 2024-09-12T10:24:35Z

Hi @MarinManuel, I'm incredibly sorry I misunderstood from the docstring how engine kwargs was intended, I thought it was just for SLURM from the below:

old docstring

engine_kwargs : dict
        In the case of engine="slum", possible kwargs are:
        - tmp_script_folder: str, default None
            the folder in which the job scripts are created. Default: directory created by
            the `tempfile` library
        - sbatch_args: dict
          arguments to be passed to sbatch. They will be automatically prefixed with --.
          Arguments must be in the format slurm specify, see the
          [documentation for `sbatch`](https://slurm.schedmd.com/sbatch.html) for a list of possible arguments

It might be worth reverting that change such that the all kwargs come under 'engine kwargs' as before. The new docstring can look as below. I'm happy to do this if you give me access to your fork as I appreciate it will be annoying to undo something you just did due to my mistake!

New docstring

    engine_kwargs : dict
        Parameters to be passed to the underlying engine.
        Defaults are:
        * loop : None
        * joblib : n_jobs=1, backend="loky"
        * multiprocessing : max_workers=2, mp_context=None
        * dask : client=None
        * slurm : tmp_script_folder=None
        for the `slurm` entry:
            This dictionary contains arguments to be passed to sbatch.
            They will be automatically prefixed with --.
            Arguments must be in the format slurm specify, see the
            [documentation for `sbatch`](https://slurm.schedmd.com/sbatch.html) for a list of possible arguments

Some other very minor things that came up when manually testing, then I think good to go!

Need to add pytest-mock here and here in the pyproject.toml as it was a new dependency added in the test.
the test is failing on windows as some os commands used in the run_sorter_job assume we are on Linux. In the end we should skip only when not running on Linux. We need to add from platform import system at the top of test_launcher.py and @pytest.mark.skipif(system() != 'Linux', reason="Assumes we are on Linux to run SLURM").
I was manually testing and it was hard to figure out what was being passed to slurm by the CLI. I think we can add a quick print to before the subprocess call to make this clear, e.g. adding print(f"subprocess called with command {progr}\n") here.

Then all good from my end, sorry about this last-minute back and forth and thanks again for this super useful addition!

for more information, see https://pre-commit.ci

JoeZiminski · 2024-09-25T17:05:07Z

Hi @MarinManuel thanks for this, apologies please resend the invite I will fix the tests and conflicts, thanks!

Marin Manuel and others added 6 commits June 29, 2024 11:22

added option to pass extra arguments to sbatch when using `run_sort…

efa0751

…er_jobs`

added option to pass extra arguments to sbatch when using `run_sort…

4bdf244

…er_jobs`

Merge remote-tracking branch 'origin/main'

2d0a83b

# Conflicts: # src/spikeinterface/sorters/launcher.py

mistake in Popen

ce87e68

Merge remote-tracking branch 'origin/main'

ebabfec

# Conflicts: # src/spikeinterface/sorters/launcher.py

cleaned up code

4e59f23

zm711 reviewed Jun 30, 2024

View reviewed changes

alejoe91 added the sorters Related to sorters module label Jul 1, 2024

JoeZiminski reviewed Jul 2, 2024

View reviewed changes

MarinManuel and others added 7 commits July 2, 2024 14:23

Merge branch 'SpikeInterface:main' into slurm_updates

407c488

updated to use sbatch_kwargs instead of putting all slurm arguments d…

d204642

…irectly into engine_kwargs

[pre-commit.ci] auto fixes from pre-commit.com hooks

8fedd95

for more information, see https://pre-commit.ci

removed sbatch_executable from the list of kwargs

0733589

Merge remote-tracking branch 'origin/slurm_updates' into slurm_updates

54a7b8f

# Conflicts: # src/spikeinterface/sorters/launcher.py

removed sbatch_executable from the list of kwargs

9c3ff1d

[pre-commit.ci] auto fixes from pre-commit.com hooks

c86f3b2

for more information, see https://pre-commit.ci

JoeZiminski reviewed Jul 29, 2024

View reviewed changes

src/spikeinterface/sorters/launcher.py Outdated Show resolved Hide resolved

MarinManuel and others added 4 commits August 15, 2024 20:58

Merge branch 'main' into slurm_updates

97198de

clarified docstring and added error for cpus_per_taks

f12461f

added test

002e959

[pre-commit.ci] auto fixes from pre-commit.com hooks

88ca2f1

for more information, see https://pre-commit.ci

MarinManuel and others added 2 commits August 15, 2024 21:25

added test

e4f9f1f

[pre-commit.ci] auto fixes from pre-commit.com hooks

e4b0b81

for more information, see https://pre-commit.ci

JoeZiminski reviewed Sep 10, 2024

View reviewed changes

src/spikeinterface/sorters/tests/test_launcher.py Outdated Show resolved Hide resolved

JoeZiminski reviewed Sep 10, 2024

View reviewed changes

src/spikeinterface/sorters/tests/test_launcher.py Outdated Show resolved Hide resolved

Merge branch 'SpikeInterface:main' into slurm_updates

492c4d0

MarinManuel and others added 6 commits September 11, 2024 13:36

docstring fix

5b6d560

Co-authored-by: Zach McKenzie <[email protected]>

docstring fix

35a2a7e

Co-authored-by: Zach McKenzie <[email protected]>

docstring fix

910fa61

Co-authored-by: Joe Ziminski <[email protected]>

added slurm_kwargs argument

6ba8423

fixed test

0ec9af5

[pre-commit.ci] auto fixes from pre-commit.com hooks

e707170

for more information, see https://pre-commit.ci

MarinManuel and others added 4 commits September 12, 2024 13:49

fixed test failing on Windows by limiting to slurm test to Linux

e7a89c9

fixed test failing on Windows by limiting to slurm test to Linux

48aee1b

reverted slurm_kwargs and improved docstring

6bc8be2

[pre-commit.ci] auto fixes from pre-commit.com hooks

d2ac504

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update to run_sorter_jobs() and slurm #3105

update to run_sorter_jobs() and slurm #3105

MarinManuel commented Jun 29, 2024

alejoe91 commented Jun 30, 2024

zm711 left a comment

JoeZiminski left a comment

JoeZiminski commented Jul 29, 2024 •

edited

Loading

MarinManuel commented Aug 16, 2024

JoeZiminski commented Sep 10, 2024 •

edited

Loading

MarinManuel commented Sep 11, 2024

JoeZiminski commented Sep 12, 2024 •

edited

Loading

JoeZiminski commented Sep 25, 2024

update to run_sorter_jobs() and slurm #3105

Are you sure you want to change the base?

update to run_sorter_jobs() and slurm #3105

Conversation

MarinManuel commented Jun 29, 2024

alejoe91 commented Jun 30, 2024

zm711 left a comment

Choose a reason for hiding this comment

JoeZiminski left a comment

Choose a reason for hiding this comment

JoeZiminski commented Jul 29, 2024 • edited Loading

MarinManuel commented Aug 16, 2024

JoeZiminski commented Sep 10, 2024 • edited Loading

MarinManuel commented Sep 11, 2024

JoeZiminski commented Sep 12, 2024 • edited Loading

JoeZiminski commented Sep 25, 2024

JoeZiminski commented Jul 29, 2024 •

edited

Loading

JoeZiminski commented Sep 10, 2024 •

edited

Loading

JoeZiminski commented Sep 12, 2024 •

edited

Loading