Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for another job scheduler. #24642

Closed
wants to merge 0 commits into from

Conversation

venkatk-ot
Copy link
Contributor

NetworkComputer(NC) now also known as Altair accelerator is a commercial job scheduler tool from Altair/RunTime This change also includes post-processing script to filter out exit code 6, which is a false error code for jobs run in NC.

util/dvsim/NcLauncher.py Outdated Show resolved Hide resolved
util/dvsim/NcLauncher.py Outdated Show resolved Hide resolved
util/dvsim/NcLauncher.py Outdated Show resolved Hide resolved
util/dvsim/NcLauncher.py Outdated Show resolved Hide resolved
util/dvsim/NcLauncher.py Outdated Show resolved Hide resolved
util/dvsim/NcLauncher.py Outdated Show resolved Hide resolved
util/dvsim/NcLauncher.py Outdated Show resolved Hide resolved
util/dvsim/nc_post_cmd.sh Outdated Show resolved Hide resolved
util/dvsim/nc_post_cmd.sh Outdated Show resolved Hide resolved
util/dvsim/nc_post_cmd.sh Outdated Show resolved Hide resolved
util/dvsim/NcLauncher.py Outdated Show resolved Hide resolved
util/dvsim/NcLauncher.py Outdated Show resolved Hide resolved
Comment on lines 104 to 116
self.process = subprocess.Popen(cmd_arr,
stdin=None,
stdout=None,
stderr=subprocess.STDOUT,
# string mode
universal_newlines=True,
env=exports,
cwd=self.deploy.odir)

# Wait until the process exit
self.process.wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the blocking version that we're doing here, do we really need Popen? Isn't it equivalent to do this?

            subprocess.run(cmd_arr,
                           stdin=None,
                           stdout=None,
                           stderr=subprocess.STDOUT,
                           universal_newlines=True,
                           env=exports,
                           cwd=self.deploy.odir)

That way, we don't need to talk about self.process at all for interactive mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but it did not work. The reason is the poll function seems to get called even for interactive job and it has code that uses self.process variable. can i leave this as it is or should i rework the poll function for interactive job ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the code is a bit ugly here. I think the call to poll comes from Scheduler._poll? Maybe the best approach is to set self.status after the run call has finished? Then NcLauncher.poll can do the same thing as LsfLauncher.poll and return it immediately.

@rswarbrick
Copy link
Contributor

Thanks for the tweaks you have made since my first review. I realise that there was quite a long list of items in that review. I think the only ones open are:

  • Using with open to simplify writing to the log (here).
  • Avoiding needing to construct timeout_secs at all (here).
  • Avoiding the unnecessary arguments in get_submit_cmd (here).
  • Nitty f-string suggestion to tidy up how we construct an error message (here).
  • Slightly purer code in get_submit_cmd (here)
  • A change to avoid Popen in interactive mode (here)

encoding='UTF-8',
errors='surrogateescape') as f:
f.write('[Executing]:\n{}\n\n'.format(self.deploy.cmd))
self.process = subprocess.Popen(cmd_arr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof: this looks a little surprising to me (which I think might have been my fault!) Does't Popen exit almost immediately? The with block will then finish. Presumably Python avoids closing the fds because they are pointed to by self.process.

My suggestion was terrible! Probably best to keep the explicit open call that you had before (possibly with a comment to avoid confusing the next person like me...). I'm really sorry for the noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done . reverted the changes with a comment explaining the reason

Comment on lines 96 to 130
f = open(self.deploy.get_log_path(),
'w',
encoding='UTF-8',
errors='surrogateescape')
f.write('[Executing]:\n{}\n\n'.format(self.deploy.cmd))
f.flush()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looking more carefully, I think my suggestion was a bit silly. I'll leave a proper note on the newly changed lines as well, but I'm really sorry for confusing stuf

@venkatk-ot venkatk-ot force-pushed the master branch 2 times, most recently from 4402d7c to d8e1962 Compare October 1, 2024 01:59
@venkatk-ot venkatk-ot requested review from marnovandermaas and HU90m and removed request for a team October 1, 2024 01:59
@venkatk-ot
Copy link
Contributor Author

reopening after resync

@venkatk-ot
Copy link
Contributor Author

reopen after resync

@venkatk-ot
Copy link
Contributor Author

Hi @rswarbrick

i tried to sync my fork to master as there were 400+ commits . but in the process this PR got closed. can you please help me reopen this PR or should i create a new PR ?

@venkatk-ot
Copy link
Contributor Author

I tried it from my side, but the best I could do was to create a new PR here
#24707

@venkatk-ot
Copy link
Contributor Author

.

@rswarbrick
Copy link
Contributor

I'm not sure exactly what happened here, but it's no problem: we'll jump over to that PR. Thanks for the links.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants