Data Staging #191
Replies: 2 comments 4 replies
-
APISupport for data staging could be rendered as a separate set of API calls: jex = psij.JobExecutor.get_instance(schema)
jex.stage_in(src, tgt, flags)
jex.stage_out(src, tgt, flags) It could also be rendered as part of the job specification, thus combining data staging and job submission into a single operation (from the API perspective): spec = psij.JobSpec()
spec.executable = jd.executable
spec.arguments = jd.arguments
spec.stage_in = [[src, tgt, flags], ...]
spec.stage_out = [[src, tgt, flags], ...] Separate calls give better control to the application but also require more application logic (when to stage, coordinate with job sandboxes, react on job state changes, ...). Both approaches could be supported, but that runs against a minimalistic API. Is minimalism important to us? ImplementationIn both cases, the implementation could be added to the implementation of the PSIJ executor. For all local executors (which all executor are in the moment), the implementation could likely be added to the executor base class or provided as auxiliary helper methods, using native python file and directory manipulation calls. SemanticsThe semantics we intent to support with data staging should be derived from the target use cases. Speaking for RCT, we would like the following set of operations to be supported:
|
Beta Was this translation helpful? Give feedback.
-
Do you mean "were quite different"? Either way, RCT also supports both approaches, even though our semantics is less sophisticated than Swift's (no caching, no caps).
What exactly is 'about that' in this sentence - an FTP like layer or staging in general? If the first I agree. Simple staging calls w/o additional policies are relatively easy to implement (both as separate calls and as part of the job spec) and should likely cover a large part of use cases already. |
Beta Was this translation helpful? Give feedback.
-
Jobs frequently require input data to be staged along with jobs. For local submission this is not as much of an issue as data can be pre-staged to the shared file systems by the user. For remote submission however, data staging should be an integral part of the job submission process. This thread can be used to discuss possible approaches for data staging support.
Beta Was this translation helpful? Give feedback.
All reactions