-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Scheduler.parse_output
for the SlurmScheduler
plugin
#3931
Implement Scheduler.parse_output
for the SlurmScheduler
plugin
#3931
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #3931 +/- ##
===========================================
+ Coverage 79.32% 79.33% +0.02%
===========================================
Files 468 468
Lines 34713 34737 +24
===========================================
+ Hits 27532 27555 +23
- Misses 7181 7182 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
959da78
to
05dd0ae
Compare
b5c030e
to
9a4d224
Compare
@sphuber I've run some tests using the One question I had was whether we should not also allow the Regarding the discussion on whether to exit when the scheduler returns a non-zero exit code in #3906: I'd also vote to maintain backwards compatibility and only add a warning to the log for now. |
9a4d224
to
fe31e42
Compare
The problem with this is that this error does not come from the SLURM scheduler, but from the GPU, specifically the Nvidia card that is used on the Piz Daint XC50 nodes. To me that means we should not be parsing for this in the
Same point as before: this information has nothing to do with the SLURM scheduler and so I don't think the logic to parse this data should be baked in there. Somebody else running a job on a different machine that also uses a SLURM scheduler might not get this information. |
fe31e42
to
57f0918
Compare
Sorry for taking so long to pick this up again!
Fair enough! That all makes sense to me (now 😅). I suppose at some point we can introduce "architecture-related" parsing, but I agree that this should not be done here. I've gone through the code again and have no further comments. I also have several calcjobs with |
This implements the `Scheduler.parse_output` method that allows parsing the detailed job info that is retrieved from the scheduler when a job is finished. For the time being, only an out-of-memory error is detected and the corresponding exit code is returned.
57f0918
to
a1b5729
Compare
This implements the
Scheduler.parse_output
method that allows parsingthe detailed job info that is retrieved from the scheduler when a job is
finished. For the time being, only an out-of-memory error is detected
and the corresponding exit code is returned.