Yaml spec expressiveness #90

cisaacstern · 2024-07-19T23:02:25Z

WIP closes #50 when complete

Opening as a draft just to get ideas out there

cisaacstern · 2024-07-31T06:22:10Z

Thank you in advance to all reviewers for bearing with the scope here. To recap above comments, this PR:

Closes #50
Closes #105
Closes #106
Closes #107

The first of which was an early brainstorming issue for this work, and the latter three of which are essential for getting an end-to-end patrols workflow to run.

As suggested by #50, at least some of the syntax here should be familiar from GitHub Workflows (with for keyword arguments, ${{ }} for variable references, namespacing of variable contexts), but I haven't followed GitHub exactly, in the interest of simplification where appropriate, and also because certain features we need (i.e. "map") don't have 1:1 analogs in GitHub Workflows.

Note that while this PR does implement mode: "map" for tasks, it does not:

Implement groupers + groupby/split, which I will address separately in Task: Define Groupers #72 and Task: groupby (i.e. "split") #109, respectively
Implement Lithops map, which I will address via LithopsExecutor #116

In terms of suggestions for reviewing, some approachable starting places might be:

README diff
examples/compilation-specs/* YAML spec diffs
Docstrings and field descriptions in compiler.py (especially for TaskInstance)
Compiler tests

Of lesser consequence (unless you're specifically interested) might be the fact that I've refactored some of the jinja logic into macros to make it more developer-friendly & readable, and also the details of the compiled workflows, which are all run end-to-end in test examples.

I'll note also that I don't feel especially dogmatic about much here, my intention is not necessarily to set this spec in stone as it is shown here, and I fully expect we want to iterate further on this. Rather, I am attempting here to just put in place the "minimal" requirements for defining a spec that can handle and end-to-end patrols workflow. (Minimal in quotes because it's kind of a lot, but that's because our earlier draft was pretty sparse!)

Please let me know of any questions, comments, or concerns!

walljcg · 2024-07-31T06:46:25Z

README.md

+      # `observations`, and we want the value passed to this argument to be the return
+      # value of the task instance with id `obs`, which is the root task of this workflow.
+      # note that (a) variables which resolve based on the outputs of other tasks are wrapped
+      # in `${{ }}`; and (b) return values of other task instances are referenced by strings


so (b) means a task has no previous dependencies?

I'm not sure that I totally understand your question but I think the answer is no.

(b) reads in full:

# ... (b) return values of other task instances are referenced by strings # having the structure `workflow.<id>.return`.

which is meant to indicate, for example, that we can reference the return value of a task instance defined like this:

- name: Get SubjectGroup Observations from EarthRanger id: obs task: get_subjectgroup_observations

using a with block like this:

with: observations: ${{ workflow.obs.return }}

Yun-Wu · 2024-07-31T07:00:34Z

README.md


-The inline comments in this example explain what each line means:
+```yaml
+- name:  # [required] a human readable name for the step


where do we use name?

Right now this is used in a few ways:

To provide descriptive error messages pointing the user to a particular task instance in their workflow, e.g.

ecoscope-workflows/tests/test_compiler.py

Lines 109 to 112 in e1f66bf

expected_error_text = re.escape(

"All task instance `id`s must be unique in the workflow. Found duplicate ids: "

"id='obs' is shared by ['Get Subjectgroup Observations', 'Process Relocations']"

)

To provide descriptive comments identifying which task instance a block of code or configuration is for, if that code or config is expected to be human-readable. So in the fillable yaml, e.g.:

ecoscope-workflows/examples/params/patrol_workflow_params_fillable.yaml

Lines 1 to 2 in e1f66bf

# Parameters for 'Get Patrol Observations from EarthRanger' using task `get_patrol_observations`.

patrol_obs:

Your question made me realize two things I needed to fix:

We should also be using these names for the cell headers in jupytext, fixed that just now in 4ce2ab5

In the task instances, these names are the human-readable identifiers, whereas the ids are the programmatic identifiers. Therefore I realized it was confusing that the top-level name of the compilation spec was (as used in this PR) in fact not supposed to be a human-readable name, but rather a programmatic identifier. I therefore renamed this field to id in fc97e10

Yun-Wu · 2024-07-31T07:02:04Z

examples/compilation-specs/patrol_workflow.yaml

+    with:
+      # FIXME: i had this typo and it was not caught by validator, i.e.
+      # we cannot be allow texts to reference their own ids as dependencies
+      # data: ${{ patrol_events_map_widget.return }}


Do we have a validator of this spec?

Good catch! Fixed that just now in eb0e967

Yun-Wu · 2024-07-31T07:08:25Z

ecoscope_workflows/compiler.py

+                        )
+        return self
+
+    # TODO: on __init__ (or in cached_property), sort tasks


What's the current order of execution?

The current order of execution follows the order of the task instance list in the spec, so for a spec like

id: calculate_time_density workflow: - name: Get Subjectgroup Observations id: obs task: get_subjectgroup_observations - name: Process Relocations id: relocs task: process_relocations with: observations: ${{ workflow.obs.return }} - name: Transform Relocations to Trajectories id: traj task: relocations_to_trajectory with: relocations: ${{ workflow.relocs.return }}

the execution order would be:

Get Subjectgroup Observations

Process Relocations

Transform Relocations to Trajectories

This also applies if we are combining DAG branches (via either reduction or mapping) so the order of execution for a reduction like

id: create_dashboard workflow: - name: Create Map Widget Single View id: map_widget task: draw_ecomap - name: Create Plot Widget Single View id: plot_widget task: draw_ecoplot - name: Gather Dashboard id: dashboard task: gather_dashboard with: widgets: - ${{ workflow.map_widget.return }} - ${{ workflow.plot_widget.return }}

would be:

Create Map Widget Single View

Create Plot Widget Single View

Gather Dashboard

And for a mapping like

id: calculate_time_density workflow: - name: Get Subjectgroup Observations A id: obs_a task: get_subjectgroup_observations - name: Get Subjectgroup Observations B id: obs_b task: get_subjectgroup_observations - name: Draw Ecomaps id: ecomaps task: draw_ecomap mode: "map" iter: geodataframe: - ${{ workflow.obs_a.return }} - ${{ workflow.obs_b.return }}

would be:

Get Subjectgroup Observations A

Get Subjectgroup Observations B

Draw Ecomaps

So this will obviously create problems if all dependencies of a task are not already executed by the time it is executed (because then some of its specified arguments will be undefined).

While I was writing this response I realized that it's actually not that hard to raise a ValidationError if the list is not correctly sorted, so I added that check in a2cc7da + e30c013

This check doesn't actually sort the dag (which the comment you were asking about suggested we do) ... but the more I think about it, I realized I am not sure if we actually want to do that or not. I opened #118 for us to consider that.

cisaacstern · 2024-07-31T16:36:21Z

Thanks all for the reviews!

Particularly @Yun-Wu whose questions prompted a quick series of last-minute fixes + tweaks + enhancements which I would not have otherwise thought of.

Going to merge to keep moving forward.

cisaacstern added 5 commits July 19, 2024 15:59

rewrite compilation spec in a new way

b18a0d3

widgets first commit

f2378a5

add test for single view widget

8886444

test merge widget views

36e6adc

test merge views multiple widgets

5a59840

cisaacstern changed the title ~~Yaml spec verboseness~~ Yaml spec expressiveness Jul 20, 2024

cisaacstern added 24 commits July 22, 2024 10:38

widgets WIP

157db97

Merge remote-tracking branch 'origin/main' into widgets

8da41c5

a little more typing for filters

463a0e3

widget typing cont

d6862c3

test fix

6ff9a71

reorg

9a90078

typing differentiation cont

4408822

simplify return

918482c

merge grouped widgets with inplace or

9635ecc

test get_view, and narrow its scope

c6cb515

makes more sense for merge_key to be a property of GroupedWidget

ffec4f3

split widget creators into separate tasks

beea2a2

allow none for view, test all widget creators

2f0c74a

recompile seq dag examples with create widget

8e5060d

refactor widget tasks to make them work w jupytext

15e8c6f

recompile params

ab8b6ba

end to end tests

068bab4

a little renaming

c9c5fce

fix test

3488ef1

dashboard first commit

0e45472

drop ast parsing

303cf98

call tasks directly in jupytext

145b5b5

Merge branch 'drop-ast' into widgets

7b5da50

with drop ast, remove annoying duplicate imports

7ece200

cisaacstern added 2 commits July 30, 2024 21:26

remove commented-out method

e57c15b

remove unused flatmap

42ec19e

cisaacstern mentioned this pull request Jul 31, 2024

YAML Spec: possibly support mode: flatmap #115

Open

cisaacstern added 3 commits July 30, 2024 21:46

update yaml spec in README wip

6299a51

update compilation spec docs cont.

ff01d68

final readme cleanup

7b0ebbb

cisaacstern mentioned this pull request Jul 31, 2024

LithopsExecutor #116

Closed

cisaacstern marked this pull request as ready for review July 31, 2024 06:22

cisaacstern requested review from Yun-Wu, atmorling and walljcg July 31, 2024 06:22

cisaacstern mentioned this pull request Jul 31, 2024

Consolidate time density example workflow into patrols workflow #117

Merged

walljcg reviewed Jul 31, 2024

View reviewed changes

Yun-Wu reviewed Jul 31, 2024

View reviewed changes

cisaacstern added 7 commits July 31, 2024 07:48

check tasks cant depend on self

eb0e967

rename spec.name -> spec.id

fc97e10

spec id readme and test

219438f

make an xfailed test for toplogical sort

e1f66bf

use task instance name for cell headers in jupytext

4ce2ab5

add and test task_instance_dependencies property

a2cc7da

check that task instances are in topological order

e30c013

cisaacstern mentioned this pull request Jul 31, 2024

Do we want to topologically sort task lists, or should we require they are correctly sorted by the user? #118

Open

cisaacstern merged commit 2189405 into main Jul 31, 2024
4 checks passed

cisaacstern deleted the more-verbose-yaml-spec branch July 31, 2024 16:36

This was referenced Jul 31, 2024

Compiler: fused chained map #110

Open

Use graphlib to check that specs are acyclic #11

Closed

Remove FIXME comment re: topological sort from script template #121

Closed

Decorator retool 🔧 ⚙️ #128

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yaml spec expressiveness #90

Yaml spec expressiveness #90

cisaacstern commented Jul 19, 2024 •

edited

Loading

cisaacstern commented Jul 31, 2024 •

edited

Loading

walljcg Jul 31, 2024

cisaacstern Jul 31, 2024 •

edited

Loading

Yun-Wu Jul 31, 2024

cisaacstern Jul 31, 2024

cisaacstern Jul 31, 2024

cisaacstern Jul 31, 2024

Yun-Wu Jul 31, 2024

cisaacstern Jul 31, 2024

Yun-Wu Jul 31, 2024

cisaacstern Jul 31, 2024 •

edited

Loading

cisaacstern Jul 31, 2024

cisaacstern Jul 31, 2024

cisaacstern commented Jul 31, 2024

	expected_error_text = re.escape(
	"All task instance `id`s must be unique in the workflow. Found duplicate ids: "
	"id='obs' is shared by ['Get Subjectgroup Observations', 'Process Relocations']"
	)

	# Parameters for 'Get Patrol Observations from EarthRanger' using task `get_patrol_observations`.
	patrol_obs:

Yaml spec expressiveness #90

Yaml spec expressiveness #90

Conversation

cisaacstern commented Jul 19, 2024 • edited Loading

cisaacstern commented Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

cisaacstern Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cisaacstern Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cisaacstern commented Jul 31, 2024

cisaacstern commented Jul 19, 2024 •

edited

Loading

cisaacstern commented Jul 31, 2024 •

edited

Loading

cisaacstern Jul 31, 2024 •

edited

Loading

cisaacstern Jul 31, 2024 •

edited

Loading