Skip to content

caplett/hydra_submitit_extension

Repository files navigation

hydra_submitit_extension

A more flexible Hydra Submitit Launcher plugin

This launcher is a extension of Submiti Launcher Plugin.

Features:

  • Iterative scheduling of jobs:
    • Max Number for Active and Pending Jobs
    • Automatically Schedule Jobs and Wait depending on Limits

Drawbacks:

  • Single Jobs are submitted instead of Job arrays
  • Hydra must be active to schedule jobs
  • No Naming Scheme for Slurm jobs. This makes manual canceling difficult

Installation

pip install hydra-submitit-extension

Quickstart

python main.py hydra/launcher=submitit_slurm_extended -m

All Parameter:

# @package hydra.launcher

# No changes to Submitit Launcher plugin
submitit_folder: ${hydra.sweep.dir}/.submitit/%j
timeout_min: 30
cpus_per_task: 1
gpus_per_node: null
tasks_per_node: 1
mem_gb: 1
nodes: 1
name: ${hydra.job.name}
partition: "dev_single"
qos: null
comment: null
constraint: null
exclude: null
#gres: "gpu:1"
cpus_per_gpu: null
gpus_per_task: null
mem_per_gpu: null
mem_per_cpu: null
account: null
signal_delay_s: 120
max_num_timeout: 0
additional_parameters: {}
array_parallelism: 256
setup: null

# Hydra Submitit Extension

_target_: hydra_plugins.hydra_submitit_extension.submitit_launcher.ExtendedSlurmLauncher

# Time between reschedule tries in s.
# Min is 60s
reschedule_interval: 60

# Maximum number of total active jobs in slurm account.
max_jobs_in_total: 5

# Maximum number of active jobs in current partition (e.g dev_single).
max_jobs_in_partition: 4

# Maximum number of active jobs in current sweep.
max_jobs_in_sweep: 3

Roadmap

  • Iterative Scheduling
  • Greedy Partition Selection

About

A more flexible hydra submitit launcher

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages