-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multiple results via multiple save_result nodes #424
multiple results via multiple save_result nodes #424
Comments
@VictorVerhaert this is the feature we need for lcfm to write outputs with multiple resolutions The current behaviour is non-standard, so we can replace it, only may need to check with corsa team who recently started using it. |
This is about having multiple nodes with I find this highly confusing as this example is not accepted by the web editor (I'm assuming the web editor complies with the spec): {
"process_graph": {
"load1": {
"process_id": "load_collection",
"arguments": {
"id": "AGERA5",
"spatial_extent": {
"west": -14.697125210864408,
"east": 26.11928103913559,
"south": 46.96632122789427,
"north": 57.64623333077631
},
"temporal_extent": [
"2024-06-05T00:00:00Z",
null
]
}
},
"load2": {
"process_id": "load_collection",
"arguments": {
"id": "SENTINEL3_SLSTR",
"spatial_extent": {
"west": -17.158692149008044,
"east": 23.657714100991956,
"south": 47.38452737005318,
"north": 57.97398258202165
},
"temporal_extent": [
"2024-06-18T00:00:00Z",
null
]
}
},
"merge3": {
"process_id": "merge_cubes",
"arguments": {
"cube1": {
"from_node": "load1"
},
"cube2": {
"from_node": "load2"
}
}
},
"save4": {
"process_id": "save_result",
"arguments": {
"data": {
"from_node": "merge3"
},
"format": "PARQUET"
}
,
"result": true
},
"save5": {
"process_id": "save_result",
"arguments": {
"data": {
"from_node": "load1"
},
"format": "COVJSON"
},
"result": true
}
},
"parameters": []
}
|
@bossie the spec explains it:
So your example is indeed not valid, and my description was confusing, will update. |
Some more insights from Open-EO/openeo-processes#279 and Open-EO/openeo-api#427 about different kinds of terminal nodes in the openEO API
I guess in our current implementation, we have some mismatching assumptions about the equivalence between "end node", "result node" and "save_result" (Note that some of these mismatching assumptions are also present in the Python client I'm afraid -> Open-EO/openeo-python-client#583) |
Thanks @soxofaan. I've yet to go through it in detail but it already seems to answer some of my questions with regards to end/result node nuances and how the "result" property matters for parent/child process graphs rather than processes within the same graph (which is not at all what the API spec says IMO) 👍 |
TODO; analyze and split up in smaller parts. There are already plan-b's in place for LCFM |
Related: #295: implemented multiple |
The current implementation in the back-end is wrong in that: process graph evaluation starts from the single result node and therefore only considers this result node and its dependencies (including transitive ones); other end nodes are not evaluated. The API spec words it like this:
|
Random ramblings:
def collect(args: ProcessArgs, env: EvalEnv): # args contain values of end nodes
return env[ENV_RESULT] # the value of the "result" node |
Some feedback ramblings:
I think we should also consider stepping away from the idea that However, returning something completely different from
This way we can gradually migrate from the old way to the new way
At the top level of a process graph, the "result node" (the single node with It's more important to collect all
The practicalities are a bit cloudy now so I might be wrong here, but if I'm not mistaking, the results are materialized now just in time, just before sending a sync request response, or at the end of a batch job. Maybe the materialization should happen at the time of executing |
Maybe it has to do with the way child process graphs are implemented (I still have to look into that) but surely a process graph should evaluate to its result, especially because this result is used by the parent process graph.
It depends on whether we treat the top-level process graph the same as child process graphs. It's true that we don't actually use the result of the top-level PG and rightly return
You are correct and this behavior might have to change as in the context of things related to |
@VictorVerhaert do you have an example process graph I can test this feature with? |
I think |
Traceback (most recent call last): File "/home/bossie/PycharmProjects/openeo/venv38/lib/python3.8/site-packages/flask/app.py", line 880, in full_dispatch_request rv = self.dispatch_request() File "/home/bossie/PycharmProjects/openeo/venv38/lib/python3.8/site-packages/flask/app.py", line 865, in dispatch_request return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return] File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/users/auth.py", line 95, in decorated return f(*args, **kwargs) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/views.py", line 952, in queue_job backend_implementation.batch_jobs.start_job(job_id=job_id, user=user) File "/home/bossie/PycharmProjects/openeo/openeo-geopyspark-driver/openeogeotrellis/backend.py", line 1721, in start_job self._start_job(job_id, user, _get_vault_token) File "/home/bossie/PycharmProjects/openeo/openeo-geopyspark-driver/openeogeotrellis/backend.py", line 1769, in _start_job job_dependencies = self._schedule_and_get_dependencies( File "/home/bossie/PycharmProjects/openeo/openeo-geopyspark-driver/openeogeotrellis/backend.py", line 2247, in _schedule_and_get_dependencies convert_node(result_node, env=env.push({ENV_DRY_RUN_TRACER: dry_run_tracer, ENV_SAVE_RESULT:[],"node_caching":False})) File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/ProcessGraphDeserializer.py", line 458, in convert_node env[ENV_FINAL_RESULT][0] = process_result File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/utils.py", line 49, in __getitem__ return self._parent[key] File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/utils.py", line 49, in __getitem__ return self._parent[key] File "/home/bossie/PycharmProjects/openeo/openeo-python-driver/openeo_driver/utils.py", line 51, in __getitem__ raise KeyError(key) KeyError: 'final_result'
Supports multiple Note that you might have to make sure that these {
"process_id": "save_result",
"arguments": {
"data": {"from_node": "loadcollection1"},
"format": "GTiff",
"options": {"filename_prefix": "intermediate"},
}
} *there was never support for multiple |
TODO: add/adapt an integration test. |
Added |
According to the spec, it should be allowed to have multiple save_result nodes, for instance to write images for different branches of the process graph.
To implement this, the idea was to perhaps add a dummy 'noop' node backend side, so that we convert the multi output node graph back into single node output.
The text was updated successfully, but these errors were encountered: