-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cast string "array.id" to integer, makeClusterFunctionTorque with SSH #53
base: master
Are you sure you want to change the base?
Conversation
In a multi-node cluster site only one node may be able to accept Torque commands. If this node is accessible via SSH, BatchJobs can run on any other node tunnelling the Torque command to that node.
On submit of array jobs each sub job gets as batch id array.id[]. array.id[] is not in qselect output. Thus waitForJobs stops after 5 sleeps because matching of internal batch ids with listJobs returns an empty set.
Do you login on the master via ssh to call qselect? |
Yes, I do. Please see a17396e: in |
I'm amazed that this works! We've to check this on some more systems though. |
Why should it not work? |
I need to double check that exit codes are correctly forwarded and quoting is correct. |
@berndbischl Your opinion? |
That's true, a shared file system is still needed. |
If Bernd does not have any objections, I'm afraid the SSH stuff will not make it into the next release because I do not have enough time to test this. But I would pull your changes after October 20th and try to generalize it for other cluster functions as well. |
@mllg |
… file systems it can happen that the file is not available instantaneous
… file systems it can happen that the file is not available instantaneous
# Conflicts: # R/clusterFunctionsTorque.R
Change to slurm scheduler
asInt
expects a numeric input value.array.id
as output fromSys.getenv
is a string.In a multi-node cluster site only one node may be able to accept Torque
commands. If this node is accessible via SSH, BatchJobs can run on any
other node tunnelling the Torque command to that node.