Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for self-grafting #154

Open
nschneid opened this issue Jul 18, 2013 · 0 comments
Open

Add support for self-grafting #154

nschneid opened this issue Jul 18, 2013 · 0 comments

Comments

@nschneid
Copy link
Contributor

Use case: for a task (such as training a model) that uses a tunable number of iterations, such that intermediate results are saved and used to initialize for subsequent iterations, and can also be used directly by downstream tasks.

A trivial but ugly solution would be creating a different task for each iteration, and using a branch point on the downstream task to select the output of one of those previous tasks.

The proposed pattern, “self-grafting”, would instead feed the output of one task into a subsequent realization of the same task by way of a branch graft. For example:

$ cat selfgraft.tape
task preproc > trainingdata devdata {
  echo "" > $trainingdata
  echo "" > devdata
}

task learn < in=$trainingdata@preproc init=(I: 0=/dev/null 1=$model@learn[I:0] 2=$model@learn[I:1] 3=$model@learn[I:2] 4=$model@learn[I:3]) > model {
  echo "./train --data $in --init-model $init" > $model
}

# can be run with any value of I
task predict_eval < in=$devdata@preproc model=@learn > preds scores {
  echo "./predict --data $in --model $model" > $preds
  echo "./eval --data $in --preds $preds" > $scores
}

This is not perfect because it still requires manually specifying the inputs for each iteration. But it is more compact than having a bunch of tasks, and conceptually, it seems to me like it should work: though the learn task takes its own output as an input, it is strictly from a completed realization, so the dependencies are correctly specified.

Currently, static analysis succeeds but an error occurs when running the workflow:

$ ../ducttape selfgraft.tape -j4
ducttape 0.3
by Jonathan Clark
Loading workflow version history...
Have 0 previous workflow versions
No plans specified in workflow -- Using default one-off realization plan: Each realization will have no more than 1 non-baseline branch
Checking for completed tasks
Finding packages...
Found 0 packages
Checking for already built packages (if this takes a long time, consider switching to a local-disk git clone instead of a remote repository)...
Checking inputs...
Work plan (depth-first traversal):
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./preproc/Baseline.baseline (Baseline.baseline)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/Baseline.baseline (I.0)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/Baseline.baseline (I.0)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.1 (I.1)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.1 (I.1)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.2 (I.2)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.2 (I.2)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.3 (I.3)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.3 (I.3)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.4 (I.4)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.4 (I.4)
Are you sure you want to run these 11 tasks? [y/n] y
Exception in thread "main" java.lang.RuntimeException: Task not found: learn/Baseline.baseline/1
    at ducttape.versioner.WorkflowVersionStore$.dependencies(WorkflowVersionStore.scala:136)
    at ducttape.versioner.WorkflowVersionStore$$anonfun$6.apply(WorkflowVersionStore.scala:177)
    at ducttape.versioner.WorkflowVersionStore$$anonfun$6.apply(WorkflowVersionStore.scala:177)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
    at scala.collection.AbstractTraversable.map(Traversable.scala:105)
    at ducttape.versioner.WorkflowVersionStore$.create(WorkflowVersionStore.scala:177)
    at ducttape.versioner.TentativeWorkflowVersionInfo.commit(WorkflowVersionInfo.scala:101)
    at ducttape.cli.ExecuteMode$.run(ExecuteMode.scala:120)
    at Ducttape$$anonfun$main$8.apply(ducttape.scala:879)
    at ducttape.cli.ErrorUtils$.ex2err(ErrorUtils.scala:59)
    at Ducttape$.main(ducttape.scala:572)
    at Ducttape.main(ducttape.scala)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant