Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smarter traversal with -j flag #163

Open
nschneid opened this issue Dec 17, 2013 · 2 comments
Open

Smarter traversal with -j flag #163

nschneid opened this issue Dec 17, 2013 · 2 comments

Comments

@nschneid
Copy link
Contributor

ducttape-0.3 defaults to depth-first traversal of the realization graph in order to try different kind of tasks quickly (and fail fast). But when the user elects to run multiple processes, this ordering for certain workflows will actually prevent parallelism. For example, if the realization graph is like

a.1 --> b.1 -+-> c --> d
a.2 --> b.2  /

(tasks a and b having two realizations each, both of which are prerequisites to c), then the depth-first strategy will yield a.1 b.1 a.2 b.2 c d, where the only opportunity for parallelism is b.1 and a.2. Because c depends on two branches, this would actually fail more slowly than the breadth-first strategy of a.1 a.2 b.1 b.2 c d, because the a.1/a.2 and b.1/b.2 might both run in parallel given -j2.

I suspect a smarter strategy might be a hybrid depth/breadth traversal, where the breadth is constrained by the number of things allowed to execute in parallel. Thus -j2 would be equivalent to breadth-first for the structure above, while -j3 would cause the traversal to start with up to 3 ready tasks, and so forth.

Or, can the traversal order be computed on the fly? Then after starting a.1 with -j2 it would realize that b.1 is not ready and therefore start a.2 in parallel.

@nschneid
Copy link
Contributor Author

A related problem is that if a realization of a task fails, it is easy to miss it when other things are running in parallel. Maybe there could be an option to have ducttape immediately suspend other running or scheduled realizations of a task if one of them fails?

@jhclark
Copy link
Owner

jhclark commented May 15, 2014

I'm skeptical about immediately stopping. More often than not, I don't want to kill a tuning run that's been going for 12 hours on 32 cores just because I made a simple typo in a results formatting script for some barely-related realization.

So far I haven't seen enough painful situations come out of this. I generally just open up the number of tasks to execute in parallel to something huge and let a job scheduler figure it out.

If you're interested in spending some cycles on this, let me know. The traversal order is implemented as an agenda sort via the Ordering[D] parameter passed to https://github.com/jhclark/ducttape/blob/master/src/main/scala/ducttape/hyperdag/walker/UnpackedPhantomMetaDagWalker.scala.

Here's the primary implementation of traversals (it's simple!): https://github.com/jhclark/ducttape/blob/master/src/main/scala/ducttape/hyperdag/walker/Traversal.scala

You can see it getting sloshed around here: https://github.com/jhclark/ducttape/blob/master/src/main/scala/ducttape/cli/ExecuteMode.scala.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants