Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors after running history-resume (Heterogeneous jobs) #15

Open
mrhawkin opened this issue May 9, 2023 · 1 comment
Open

Errors after running history-resume (Heterogeneous jobs) #15

mrhawkin opened this issue May 9, 2023 · 1 comment

Comments

@mrhawkin
Copy link

mrhawkin commented May 9, 2023

Got this error first time I did history-resume:
(It did not show up right away but after running some time)

Traceback (most recent call last):
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 798, in <module>
    exit(main(sys.argv[1:]))
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 564, in main
    errors = get_history(db, sacct_filter=sacct_filter,
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 635, in get_history
    errors += slurm2sql(db, sacct_filter=new_filter, update=True, jobs_only=jobs_only,
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 746, in slurm2sql
    processed_line = {k.strip('_'): (columns[k](line[k])

When I tried again I got this:

Traceback (most recent call last):
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 798, in <module>
    exit(main(sys.argv[1:]))
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 564, in main
    errors = get_history(db, sacct_filter=sacct_filter,
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 635, in get_history
    errors += slurm2sql(db, sacct_filter=new_filter, update=True, jobs_only=jobs_only,
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 746, in slurm2sql
    processed_line = {k.strip('_'): (columns[k](line[k])
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 749, in <dictcomp>
    else columns[k].calc(line))
  File "/cluster/home/haagen/slurm/slurm2sql.py", line 317, in calc
    return int(row['JobID'].split('_')[0].split('.')[0])
ValueError: invalid literal for int() with base 10: '18058278+0'
@rkdarst
Copy link
Member

rkdarst commented Aug 4, 2024

I think this is about "heterogeneous jobs". I recently (months ago?) saw this in some of our history: + can be in JobIDs to distinguish different parts of heterogeneous jobs. As an immediate workaround I made it ignore these, so it will be wrong/duplicates, but also it succeeds.

Our clusters users don't use them often, so I could ignore them for our statistics. But it might be good to add in some handling someday... if anyone needs, please ask.

@rkdarst rkdarst changed the title Errors after running history-resume Errors after running history-resume (Heterogeneous jobs) Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants