Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mapping of scheduler jobs to project #3

Open
csadorf opened this issue Mar 7, 2017 · 4 comments
Open

Mapping of scheduler jobs to project #3

csadorf opened this issue Mar 7, 2017 · 4 comments
Labels
enhancement New feature or request

Comments

@csadorf
Copy link
Contributor

csadorf commented Mar 7, 2017

Original report by Carl Simon Adorf (Bitbucket: csadorf, GitHub: csadorf).


Problem

The FlowProject currently provides the scheduler_jobs() and the map_scheduler_jobs() method. These can be used to identify scheduler jobs that belong to the project within the current environment, but are still kind of awkward to use. For example, it should be simple to iterate through all scheduler-jobs associated with the current project, e.g. to change their status.

Current Solution

This is the code currently required to do so:

#!python
import flow

project = flow.FlowProject()
env = flow.get_environment()

sjobs = project.scheduler_jobs(env.scheduler_type())
sjobs_map = project.map_scheduler_jobs(sjobs)

for job in project:
    for sjobs in sjobs_map[job.get_id()].values():
        for sjob in sjobs:
            # do something with sjob

The reason for this rather convoluted approach is to enforce the querying of the environment scheduler only once as opposed to multiple times, for example for each job.

Proposed Enhancement

I propose to protect the environment scheduler resource, using the following API:

#!python
import flow

project = FlowProject()
env = flow.get_environment()

result = project.query_scheduler(env)
for job in project:
    for op_name, sjob in result(job):
        # do something with sjob
@mikemhenry mikemhenry added this to the v0.7 milestone Feb 16, 2019
@csadorf csadorf modified the milestones: v0.7, v0.8 Feb 26, 2019
@csadorf csadorf modified the milestones: v0.8, v0.9 May 24, 2019
@csadorf csadorf removed the proposal label Jul 5, 2019
@vyasr vyasr modified the milestones: v1.0, v0.9 Jul 5, 2019
@vyasr vyasr added enhancement New feature or request good first issue Good for newcomers labels Jul 5, 2019
@bdice bdice modified the milestones: v0.9.0, v0.10.0 Dec 20, 2019
@bdice
Copy link
Member

bdice commented Mar 4, 2020

This issue, or #146, would solve a problem raised by @ramanishsingh and @rsdefever at the @mosdef-hub all-hands meeting. They want to be able to put log files generated by PBS/SLURM into the corresponding job directory folder. Of course there isn't a 1-1 mapping between scheduler jobs and signac jobs, but we could probably find a way to do better than the current behavior.

@bdice bdice added the GSoC Google Summer of Code label Mar 4, 2020
@bdice bdice modified the milestones: v0.10.0, v0.11.0 Jun 27, 2020
@b-butler b-butler modified the milestones: v0.11.0, v0.12.0 Oct 7, 2020
@cbkerr cbkerr removed the good first issue Good for newcomers label Dec 18, 2020
@cbkerr
Copy link
Member

cbkerr commented Dec 18, 2020

A related use is tracking scheduler job IDs to track down errors more easily. It might be a separate issue.
There are two steps in this translation: scheduler ID --> "flow submission ID" --> job ID

My current solution involves:

(1) using a custom template that emails me the status, so I get an email with a subject like:

SLURM Job_id=37428317 Name=project_name/6d6df7ab/run/0000/22da1a8a1dc67ca8783a9d3d9db5c598 Began, Queued time 00:00:23

(where 22da1a... is the "bundle ID" and 6d6d... is the job.id.)
when it queues, completes, fails.
I use this to associate scheduler ID to flow's submission ID.

Flow prints out this in this case

 - Group: run(6d6df7ab68f4591d5e1a05065683e78b)

(2) saving (currently by copy-paste but I know I could dump the output to a file) what flow prints out when I submit jobs.
This is more of a problem when bundling. Say I submit a bundle of 100 jobs. While submitting, flow prints out

 - Group: run(c835975646cd37e561b6cbf8e7d2facd)
 - Group: run(6cf6a57ce13eb422a2306bc40142a49c)
 - Group: run(53e44e09a38c4d818afaf9221cb57d69)
   [truncated]

This is the only way I know to associate the submission ID (now the ID of the bundle) with the job.id. I look up the submission ID and find jobs in this list or find the job.id and go to its parent submission ID.

A possible solution?
By default, set job name to full job id? This could help searching scheduler submission
--job-name="project_name/6d6df7ab/run_big/0000/22da1a8a1dc67ca8783a9d3d9db5c598"

@cbkerr
Copy link
Member

cbkerr commented Dec 18, 2020

Do schedulers return job ID?
SLURM: Officially no https://slurm.schedmd.com/sbatch.html (find "RETURN VALUE" section).
We can definitely get job ID after creating from squeue, but I don't know if that number is assigned right away.

However (!), I found references to scripts that return the ID when submitting:

  1. https://ubccr.freshdesk.com/support/solutions/articles/5000688140-submitting-a-slurm-job-script (search for "step 4" to see output and also look at the scripts in step 1--3)
  2. https://kb.iu.edu/d/awrz (search for "submit your job script")

On Torque, I think the answer is yes from discussion below.

@cbkerr
Copy link
Member

cbkerr commented Dec 18, 2020

Some notes from discussion with @csadorf and @bdice yesterday afternoon:

  • Not all schedulers return scheduler ID, just "success" status
  • I learned that we store bundle information in project_root/.bundles/project_name/bundle/[bundle_IDs] (which is what gets printed out when you submit a bundle)
  • Relevant function FlowProject._expand_bundled_jobs()
  • Relevant function FlowProject._fetch_scheduler_status()
  • The "flow submission ID" is deterministically created

@bdice bdice modified the milestones: v0.12.0, v0.13.0 Jan 15, 2021
@bdice bdice removed this from the v0.13.0 milestone Mar 17, 2021
@atravitz atravitz removed the GSoC Google Summer of Code label Mar 18, 2021
@tcmoore3 tcmoore3 removed their assignment Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants