You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use case: for a task (such as training a model) that uses a tunable number of iterations, such that intermediate results are saved and used to initialize for subsequent iterations, and can also be used directly by downstream tasks.
A trivial but ugly solution would be creating a different task for each iteration, and using a branch point on the downstream task to select the output of one of those previous tasks.
The proposed pattern, “self-grafting”, would instead feed the output of one task into a subsequent realization of the same task by way of a branch graft. For example:
$ cat selfgraft.tape
task preproc > trainingdata devdata {
echo"">$trainingdataecho""> devdata
}
task learn < in=$trainingdata@preproc init=(I: 0=/dev/null 1=$model@learn[I:0] 2=$model@learn[I:1] 3=$model@learn[I:2] 4=$model@learn[I:3]) > model {
echo"./train --data $in --init-model $init">$model
}
# can be run with any value of I
task predict_eval < in=$devdata@preproc model=@learn > preds scores {
echo"./predict --data $in --model $model">$predsecho"./eval --data $in --preds $preds">$scores
}
This is not perfect because it still requires manually specifying the inputs for each iteration. But it is more compact than having a bunch of tasks, and conceptually, it seems to me like it should work: though the learn task takes its own output as an input, it is strictly from a completed realization, so the dependencies are correctly specified.
Currently, static analysis succeeds but an error occurs when running the workflow:
$ ../ducttape selfgraft.tape -j4
ducttape 0.3
by Jonathan Clark
Loading workflow version history...
Have 0 previous workflow versions
No plans specified in workflow -- Using default one-off realization plan: Each realization will have no more than 1 non-baseline branch
Checking for completed tasks
Finding packages...
Found 0 packages
Checking for already built packages (if this takes a long time, consider switching to a local-disk git clone instead of a remote repository)...
Checking inputs...
Work plan (depth-first traversal):
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./preproc/Baseline.baseline (Baseline.baseline)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/Baseline.baseline (I.0)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/Baseline.baseline (I.0)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.1 (I.1)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.1 (I.1)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.2 (I.2)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.2 (I.2)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.3 (I.3)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.3 (I.3)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./learn/I.4 (I.4)
RUN: /Users/nathan/dev/nlp-tools/ducttape-0.3/testexamples/./predict_eval/I.4 (I.4)
Are you sure you want to run these 11 tasks? [y/n] y
Exception in thread "main" java.lang.RuntimeException: Task not found: learn/Baseline.baseline/1
at ducttape.versioner.WorkflowVersionStore$.dependencies(WorkflowVersionStore.scala:136)
at ducttape.versioner.WorkflowVersionStore$$anonfun$6.apply(WorkflowVersionStore.scala:177)
at ducttape.versioner.WorkflowVersionStore$$anonfun$6.apply(WorkflowVersionStore.scala:177)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.AbstractTraversable.map(Traversable.scala:105)
at ducttape.versioner.WorkflowVersionStore$.create(WorkflowVersionStore.scala:177)
at ducttape.versioner.TentativeWorkflowVersionInfo.commit(WorkflowVersionInfo.scala:101)
at ducttape.cli.ExecuteMode$.run(ExecuteMode.scala:120)
at Ducttape$$anonfun$main$8.apply(ducttape.scala:879)
at ducttape.cli.ErrorUtils$.ex2err(ErrorUtils.scala:59)
at Ducttape$.main(ducttape.scala:572)
at Ducttape.main(ducttape.scala)
The text was updated successfully, but these errors were encountered:
Use case: for a task (such as training a model) that uses a tunable number of iterations, such that intermediate results are saved and used to initialize for subsequent iterations, and can also be used directly by downstream tasks.
A trivial but ugly solution would be creating a different task for each iteration, and using a branch point on the downstream task to select the output of one of those previous tasks.
The proposed pattern, “self-grafting”, would instead feed the output of one task into a subsequent realization of the same task by way of a branch graft. For example:
This is not perfect because it still requires manually specifying the inputs for each iteration. But it is more compact than having a bunch of tasks, and conceptually, it seems to me like it should work: though the learn task takes its own output as an input, it is strictly from a completed realization, so the dependencies are correctly specified.
Currently, static analysis succeeds but an error occurs when running the workflow:
The text was updated successfully, but these errors were encountered: