integrate CLMS HRL VPP #460

jdries · 2023-06-22T09:19:55Z

https://land.copernicus.eu/pan-european/biophysical-parameters/high-resolution-vegetation-phenology-and-productivity
problem here is that there's one collection, with multiple producttypes, and each type has a single band
this should become one collection with multiple bands

jdries · 2023-11-14T14:58:22Z

@JohanKJSchreurs This is a good candidate to test our new STAC api.
This collection already exists in opensearch catalog as well, mostly a matter of porting metadata to STAC.

jdries · 2023-11-21T14:24:15Z

Collection id: copernicus_r_3035_x_m_hrvpp-vpp_p_2017-now_v01_openeo

Example collection metadata that we would also target:
https://collections.eurodatacube.com/stac/vegetation-phenology-and-productivity-parameters-season-1.json

Python code to get the product metadata:
https://github.com/eea/clms-hrvpp-tools-python/blob/main/HRVPP_opensearch_demo/HRVPP%20catalogue%20and%20download%20demo.ipynb

Opensearch collections:
https://phenology.hrvpp2.vgt.vito.be/collections

jdries · 2023-12-08T06:25:59Z

Example url for getting S3 links:
https://phenology.hrvpp2.vgt.vito.be/products.geojson?collection=copernicus_r_3035_x_m_hrvpp-vpp_p_2017-now_v01&accessedFrom=S3

JohanKJSchreurs · 2024-02-27T08:34:42Z

We are implementing this in the stac-catalog-builder project.

The issue linked below is the main one that contains a breakdown of the parts/features we need for this integration
VitoTAP/stac-catalog-builder#16

JohanKJSchreurs · 2024-03-06T13:11:06Z

The implementation in the stac-catalog-builder is complete: VitoTAP/stac-catalog-builder#16

Some small improvements can still be done, to update the collections & items in the STAC API with improved information, but those should be separate GH issues.

At present, three of the VPP collections have been converted and uploaded now to the development environment of the terra-stac-api at VITO.
The largest collection has many products (6.5 million) and we will need to download that in several pieces because the process runs too long.

JohanKJSchreurs · 2024-03-06T14:04:00Z

Fourth collection has also been uploaded to the STAC API.

Overview:

collection	download + conversion	upload to dev STAC API	number of products	number of STAC Items
copernicus_r_3035_x_m_hrvpp-st_p_2017-now_v01	done	done	388_008	194_004
copernicus_r_3035_x_m_hrvpp-vpp_p_2017-now_v01	done	done	150_849	10_778
copernicus_r_utm-wgs84_10_m_hrvpp-st_p_2017-now_v01	done	done	470_066	235_058
copernicus_r_utm-wgs84_10_m_hrvpp-vi_p_2017-now_v01	did not finish (process got killed)	TO DO	6_564_209	unkown at present
copernicus_r_utm-wgs84_10_m_hrvpp-vpp_p_2017-now_v01	done	done	182_784	13_056

JohanKJSchreurs · 2024-03-07T10:01:47Z

How we can solve the long download of the large collection:

We can add some options to the command to tell it what time slice to download.
Right now it tries to download the entire collection, that is tot say the entire period.

We already divide that entire period into smaller time slots in order to limit the number of products in each query to a reasonable number. So if we add options for a start and end date then we could do a partial download.
That way we can test just upload and test the collection with a more limited set of STAC items

Furthermore, with same additional work we could download the whole collection in several parts, and in each run save out the STAC items for just those slices.
We can already upload upload the STAC items in several parts or sets.
At present, the collection.json file would be created and overwritten every time on disk when you do a partial download, but the collection file would still have the same data anyway. (It does not link to its STAC items in this case because there are far too many items for a static STAC collection). However with a little extra work we could split up the command so one command downloads/creates the "empty" collection and another command downloads/creates the STAC items.

VictorVerhaert · 2024-03-22T14:54:18Z

Update:
Trying out some of the suggestions johan gave to build de last collection. So far I always encounter an error stating that a request is to large.

I have been trying to load the uploaded stac collections in openeo, without luck.
Even with the isoformat fix on openeo dev I still encounter an IllegalArgument exception:
j-240321bb64b14251b296ace46577d0b4. @bossie could you have another look at this?

VictorVerhaert · 2024-03-22T16:18:45Z

It seems that the eo:bands are missing from the stac items, which is needed for openeo.

JeroenVerstraelen · 2024-03-25T09:19:56Z

STAC API does not work with openeo yet
- eo:bands are missing

Will take 2 weeks of debugging.

We already have 4/5 collections but does require someone to help testing of STAC API and integration with openEO.

VictorVerhaert · 2024-03-25T11:15:08Z

I'll keep editing this column as bugs are found.

I ran some different load_stac's from different sources, as I noticed the errors I encounter often differ.
tests ran on https://openeo.vito.be/:

Stac api	collection	job id	result	stac request url
https://stac.terrascope.be	terrascope-s2-toc-v2	j-240325bbab144c388897826ebd12a00f	java.lang.IllegalArgumentException: requirement failed: Server doesn't support ranged byte reads	https://stac.terrascope.be/search?limit=20&bbox=5.0%2C51.2%2C5.01%2C51.21&datetime=2017-06-01T00%3A00%3A00Z%2F2017-07-29T23%3A59%3A59.999000Z&collections=terrascope-s2-toc-v2
https://stac.terrascope.be	terrascope-s2-ndvi-v2	j-2403251d212041a69f875afcb6476848	error without error log in the editor
https://stac-openeo-dev.vgt.vito.be	copernicus_r_utm-wgs84_10_m_hrvpp-st_p_2017-now_v01	j-240325eb608247938a2ef79efbf66f0d	java.lang.IllegalArgumentException: requirement failed	https://stac-openeo-dev.vgt.vito.be/search?limit=20&bbox=5.0%2C51.2%2C5.01%2C51.21&datetime=2017-06-01T00%3A00%3A00Z%2F2017-07-29T23%3A59%3A59.999000Z&collections=copernicus_r_utm-wgs84_10_m_hrvpp-st_p_2017-now_v01
https://stac-openeo-dev.vgt.vito.be	TEST_Landsat_three-annual_NDWI_v1	j-240325604db14055b31b93ab58894870	OpenEOApiException(status_code=400, code='NoDataAvailable', message='There is no data available for the given extents.', id='no-request')	https://stac-openeo-dev.vgt.vito.be/search?limit=20&bbox=5.0%2C51.2%2C5.01%2C51.21&datetime=2000-06-01T00%3A00%3A00Z%2F2000-07-29T23%3A59%3A59.999000Z&collections=TEST_Landsat_three-annual_NDWI_v1

comments:

the https://stac.openeo.vito.be/api.html source contains no collections and cannot be tested
(third line) the copernicus hrvpp-st_p collection shows no eo:bands property for its assets. I have yet to discover why these were not included in the upload. Only the name property is used from eo:bands, so a quickfix might be to fall back on another name field on openeo's side
the TEST_landsat... collection is the only one containing eo:bands information on the stac-openeo-dev source

#460 Traceback (most recent call last): File "batch_job.py", line 1347, in <module> main(sys.argv) File "batch_job.py", line 1014, in main run_driver() File "batch_job.py", line 985, in run_driver run_job( File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/utils.py", line 56, in memory_logging_wrapper return function(*args, **kwargs) File "batch_job.py", line 1078, in run_job result = ProcessGraphDeserializer.evaluate(process_graph, env=env, do_dry_run=tracer) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 377, in evaluate result = convert_node(result_node, env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1572, in apply_process args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1572, in <dictcomp> args = {name: convert_node(expr, env=env) for (name, expr) in sorted(args.items())} File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 416, in convert_node return convert_node(processGraph['node'], env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 402, in convert_node process_result = apply_process(process_id=process_id, args=processGraph.get('arguments', {}), File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1604, in apply_process return process_function(args=ProcessArgs(args, process_id=process_id), env=env) File "/opt/venv/lib64/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 2216, in load_stac return env.backend_implementation.load_stac(url=url, load_params=load_params, env=env) File "/opt/venv/lib64/python3.8/site-packages/openeogeotrellis/backend.py", line 1079, in load_stac pyramid_factory = jvm.org.openeo.geotrellis.file.PyramidFactory( File "/opt/spark3_4_0/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1587, in __call__ return_value = get_return_value( File "/opt/spark3_4_0/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value raise Py4JJavaError( py4j.protocol.Py4JJavaError: An error occurred while calling None.org.openeo.geotrellis.file.PyramidFactory. : java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:268) at org.openeo.geotrellis.file.PyramidFactory.<init>(PyramidFactory.scala:47) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) at py4j.Gateway.invoke(Gateway.java:238) at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80) at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69) at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) at py4j.ClientServerConnection.run(ClientServerConnection.java:106) at java.base/java.lang.Thread.run(Thread.java:829)

bossie · 2024-03-29T09:21:51Z

FYI, this error when querying https://stac.terrascope.be:

java.lang.IllegalArgumentException: requirement failed: Server doesn't support ranged byte reads

is because the underlying assets require authentication (if you click an asset's href in your browser, it will redirect you to a login page; it's this response that doesn't have an Accept-Ranges: bytes header, hence the error).

VictorVerhaert · 2024-03-29T10:21:02Z

Update: S3 seems to work on CDSE
This leaves the following TODO's for the integration:

mount buckets on terrascope (optional, should the collections be required there)
test full collection on CDSE (WIP)
rebuild al the collections with s3 links and upload to https://stac-openeo.vgt.vito.be/api.html#/ (production stac api, currently empty)
adjust stac-builder to build largest collection without OOM (WIP)

VictorVerhaert · 2024-04-04T08:08:00Z

The following collection are now available on https://stac.openeo.vito.be/ and should work on CDSE-staging

copernicus_r_3035_x_m_hrvpp-vpp_p_2017-now_v01
copernicus_r_utm-wgs84_10_m_hrvpp-vpp_p_2017-now_v01
copernicus_r_utm-wgs84_10_m_hrvpp-st_p_2017-now_v01
copernicus_r_3035_x_m_hrvpp-st_p_2017-now_v01

I am still trying to build copernicus_r_utm-wgs84_10_m_hrvpp-vi_p_2017-now_v01 but making good progress on this by speeding up the pipeline with thread and process pools, with memory cleanup.

JeroenVerstraelen · 2024-04-05T12:32:25Z

@bossie Define collections that are stac based (layercatalog). So load_collection has to call load_stac.

VictorVerhaert · 2024-04-05T13:06:43Z

important note:
The way the stac_api works, only items of which the datetime property (often equal to the start_datetime) lies within the temporal_extent of load_stac are loaded.
For yearly assets, this means that the 1st of january needs to be included in the temporal_extent for it to be loaded.

Open-EO/openeo-geopyspark-driver#460

bossie · 2024-04-11T19:58:49Z

Note: the assets are in the HRVPP bucket on S3 endpoint http://data.cloudferro.com (not externally accessible).

bossie · 2024-04-26T08:46:39Z

@VictorVerhaert copernicus_r_utm-wgs84_10_m_hrvpp-vpp_p_2017-now_v01 can be reingested as filtering by property (in this case: "season") should work without adverse side-effects (in this case: empty results).

I did notice that proj:epsg and proj:bbox are both in 4326, whereas the actual assets are in UTM so this might be something that you want to include e.g. https://stac.openeo.vito.be/search?limit=20&bbox=5.0%2C51.2%2C5.01%2C51.21&datetime=2017-07-01T00%3A00%3A00Z%2F2018-07-30T23%3A59%3A59.999000Z&collections=copernicus_r_utm-wgs84_10_m_hrvpp-vpp_p_2017-now_v01&fields=%2Bproperties

For some reason loading these collections seems to take a really long time, much longer than I remember. 🤔 Maybe something similar to #250?

bossie · 2024-04-26T09:53:04Z

TODO: incorporate property filters defined in creo_layercatalog.json to support collections per season.

VictorVerhaert · 2024-04-29T09:48:14Z

copernicus_r_utm-wgs84_10_m_hrvpp-vpp_p_2017-now_v01 has been reuploaded with the season property. @bossie

bossie · 2024-04-29T18:28:56Z

Confirmed: works (for "s1" and "s2"):

data_cube = (connection
             .load_collection(collection_id, bands=["SPROD", "TPROD", "QFLAG"], properties={"season": lambda s: s == "s1"})
             .filter_temporal(["2017-07-01", "2018-07-31"])
             .filter_bbox([5.00, 51.20, 5.01, 51.21])
             .save_result("GTiff"))

Open-EO/openeo-geopyspark-driver#460

jdries · 2024-04-30T09:08:03Z

commited collection config for seasonal collections

Open-EO/openeo-geopyspark-driver#460

bossie · 2024-04-30T14:01:57Z

Still needs work wrt/ bands order defined in creo_layercatalog.json.

Adapt related test:

openeo-geopyspark-driver/tests/test_load_collection.py

Line 495 in 286aff2

    
           bands=["SPROD", "TPROD"])  # TODO: remove other bands from layercatalog.json, then drop this bands argument

Open-EO/openeo-geopyspark-driver#460

jdries assigned JohanKJSchreurs Nov 14, 2023

jdries assigned VictorVerhaert Jan 16, 2024

This was referenced Feb 8, 2024

Feature group several asset per band to stac item with multiple assets bands VitoTAP/stac-catalog-builder#12

Merged

Add ability to upload STAC collections and STAC items to a STAC API VitoTAP/stac-catalog-builder#16

Closed

JeroenVerstraelen unassigned JohanKJSchreurs Feb 27, 2024

JeroenVerstraelen assigned JohanKJSchreurs Mar 6, 2024

VictorVerhaert unassigned JohanKJSchreurs Mar 11, 2024

JeroenVerstraelen assigned bossie Mar 25, 2024

VictorVerhaert mentioned this issue Mar 26, 2024

load_stac: get asset metadata from "item_assets" in collection instead of from items Open-EO/openeo-python-client#553

Closed

bossie added a commit to Open-EO/openeo-geotrellis-kubernetes that referenced this issue Apr 11, 2024

expose experimental COPERNICUS_R_3035_X_M_HRVPP-VPP_P_2017-NOW_V01

b351a45

Open-EO/openeo-geopyspark-driver#460

bossie added a commit that referenced this issue Apr 11, 2024

support STAC-based catalog layers (prototype) #460

a9e5337

bossie added a commit that referenced this issue Apr 11, 2024

support STAC-based catalog layers (prototype) #460

6fe1873

bossie added a commit that referenced this issue Apr 26, 2024

support property filters for STAC based collections #460

a15693b

bossie added a commit that referenced this issue Apr 29, 2024

better URLs for STAC API test #460

dcc585a

bossie mentioned this issue Apr 29, 2024

add tests for property filters #767

Merged

bossie linked a pull request Apr 29, 2024 that will close this issue

add tests for property filters #767

Merged

bossie added a commit that referenced this issue Apr 29, 2024

test property is included in /search request (general case) #460

36ae066

bossie added a commit that referenced this issue Apr 29, 2024

better URLs for STAC API test #460

00b2735

bossie added a commit that referenced this issue Apr 29, 2024

test property is NOT included in /search request (SnapPlanet) #460

10a715f

bossie added a commit that referenced this issue Apr 29, 2024

elaborate on check of included items #460

3f251e7

jdries added a commit to Open-EO/openeo-geotrellis-kubernetes that referenced this issue Apr 30, 2024

add seasonal hrl vpp collections

9a455a5

Open-EO/openeo-geopyspark-driver#460

bossie added a commit to Open-EO/openeo-geopyspark-driver-testdata that referenced this issue Apr 30, 2024

add test files for load_collection

73e86da

Open-EO/openeo-geopyspark-driver#460

bossie added a commit that referenced this issue Apr 30, 2024

add test for load_collection #460

3968d20

bossie added a commit that referenced this issue Apr 30, 2024

fixup! add test for load_collection #460

aa82711

bossie added a commit that referenced this issue Apr 30, 2024

fixup! fixup! add test for load_collection #460

66da616

bossie added a commit that referenced this issue Apr 30, 2024

test only items that match property are considered #460

ff3c535

bossie closed this as completed in #767 Apr 30, 2024

bossie added a commit that referenced this issue Apr 30, 2024

test property filters for load_stac and load_collection #460

286aff2

bossie reopened this Apr 30, 2024

bossie added a commit that referenced this issue May 2, 2024

respect layercatalog bands order #460

99cc858

bossie mentioned this issue May 2, 2024

respect layercatalog bands order #769

Merged

bossie added a commit to Open-EO/openeo-geopyspark-driver-testdata that referenced this issue May 2, 2024

one more test file for load_collection to make sure

234a38d

Open-EO/openeo-geopyspark-driver#460

bossie added a commit that referenced this issue May 2, 2024

improve test for load_collection #460

37ae254

bossie linked a pull request May 2, 2024 that will close this issue

respect layercatalog bands order #769

Merged

bossie closed this as completed in #769 May 2, 2024

bossie added a commit that referenced this issue May 2, 2024

respect layercatalog.json bands order if none specified #460

ae36793

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

integrate CLMS HRL VPP #460

integrate CLMS HRL VPP #460

jdries commented Jun 22, 2023 •

edited

Loading

jdries commented Nov 14, 2023

jdries commented Nov 21, 2023 •

edited

Loading

jdries commented Dec 8, 2023

JohanKJSchreurs commented Feb 27, 2024

JohanKJSchreurs commented Mar 6, 2024 •

edited

Loading

JohanKJSchreurs commented Mar 6, 2024

JohanKJSchreurs commented Mar 7, 2024

VictorVerhaert commented Mar 22, 2024

VictorVerhaert commented Mar 22, 2024

JeroenVerstraelen commented Mar 25, 2024 •

edited by VictorVerhaert

Loading

VictorVerhaert commented Mar 25, 2024 •

edited

Loading

bossie commented Mar 29, 2024 •

edited

Loading

VictorVerhaert commented Mar 29, 2024 •

edited

Loading

VictorVerhaert commented Apr 4, 2024 •

edited

Loading

JeroenVerstraelen commented Apr 5, 2024

VictorVerhaert commented Apr 5, 2024

bossie commented Apr 11, 2024

bossie commented Apr 26, 2024

bossie commented Apr 26, 2024

VictorVerhaert commented Apr 29, 2024

bossie commented Apr 29, 2024

jdries commented Apr 30, 2024

bossie commented Apr 30, 2024 •

edited

Loading

integrate CLMS HRL VPP #460

integrate CLMS HRL VPP #460

Comments

jdries commented Jun 22, 2023 • edited Loading

jdries commented Nov 14, 2023

jdries commented Nov 21, 2023 • edited Loading

jdries commented Dec 8, 2023

JohanKJSchreurs commented Feb 27, 2024

JohanKJSchreurs commented Mar 6, 2024 • edited Loading

JohanKJSchreurs commented Mar 6, 2024

JohanKJSchreurs commented Mar 7, 2024

VictorVerhaert commented Mar 22, 2024

VictorVerhaert commented Mar 22, 2024

JeroenVerstraelen commented Mar 25, 2024 • edited by VictorVerhaert Loading

VictorVerhaert commented Mar 25, 2024 • edited Loading

bossie commented Mar 29, 2024 • edited Loading

VictorVerhaert commented Mar 29, 2024 • edited Loading

VictorVerhaert commented Apr 4, 2024 • edited Loading

JeroenVerstraelen commented Apr 5, 2024

VictorVerhaert commented Apr 5, 2024

bossie commented Apr 11, 2024

bossie commented Apr 26, 2024

bossie commented Apr 26, 2024

VictorVerhaert commented Apr 29, 2024

bossie commented Apr 29, 2024

jdries commented Apr 30, 2024

bossie commented Apr 30, 2024 • edited Loading

jdries commented Jun 22, 2023 •

edited

Loading

jdries commented Nov 21, 2023 •

edited

Loading

JohanKJSchreurs commented Mar 6, 2024 •

edited

Loading

JeroenVerstraelen commented Mar 25, 2024 •

edited by VictorVerhaert

Loading

VictorVerhaert commented Mar 25, 2024 •

edited

Loading

bossie commented Mar 29, 2024 •

edited

Loading

VictorVerhaert commented Mar 29, 2024 •

edited

Loading

VictorVerhaert commented Apr 4, 2024 •

edited

Loading

bossie commented Apr 30, 2024 •

edited

Loading