Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation complains about missing products on CDSE #566

Closed
jdries opened this issue Nov 6, 2023 · 9 comments · Fixed by #585
Closed

Validation complains about missing products on CDSE #566

jdries opened this issue Nov 6, 2023 · 9 comments · Fixed by #585
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented Nov 6, 2023

When using SENTINEL2_L2A on CDSE, I now get warning about missing products:
[MissingProduct] Tile 'S2B_MSIL2A_20220604T104619_N0400_R051_T31UES_20220604T124954' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20220604T104619_N0400_R051_T31UFS_20220604T124954' in collection 'SENTINEL2_L2A' is not available.

this is a bit special because CDSE is the reference archive, I'm not even sure how we can implement a proper missing products check on CDSE? Can we add config to disable this check there?

@soxofaan
Copy link
Member

soxofaan commented Nov 6, 2023

to reproduce (requires openeo client >= 0.24.0):

import openeo
con = openeo.connect("openeo.dataspace.copernicus.eu")
con.authenticate_oidc()
cube = con.load_collection(
    "SENTINEL2_L2A",
    temporal_extent=["2022-06-01", "2022-06-10"],
    spatial_extent={"west": 3, "south": 51, "east": 3.01, "north": 51.01},
    bands=["B02"]
)
cube.download("tmp.nc")

this will show warning

Preflight process graph validation raised: [MissingProduct] Tile 'S2A_MSIL2A_20220602T105631_N0400_R094_T31UDS_20220609T120117' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2A_MSIL2A_20220602T105631_N0400_R094_T31UES_20220609T120117' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20220604T104619_N0400_R051_T31UES_20220604T124954' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20220604T104619_N0400_R051_T31UDS_20220604T124954' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20220607T105619_N0400_R094_T31UES_20220607T125419' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2B_MSIL2A_20220607T105619_N0400_R094_T31UDS_20220607T125419' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2A_MSIL2A_20220609T104631_N0400_R051_T31UDS_20220609T171618' in collection 'SENTINEL2_L2A' is not available. [MissingProduct] Tile 'S2A_MSIL2A_20220609T104631_N0400_R051_T31UES_20220609T171618' in collection 'SENTINEL2_L2A' is not available.

To directly get validation report without having to wait for the download() to complete, instead do:

print(con.validate_process_graph(cube))

@soxofaan
Copy link
Member

soxofaan commented Nov 9, 2023

FYI as discussed: I pushed a quick workaround to avoid spamming users with buggy validation reports: extensive collection based validation is disabled on production instances with c63728c

@EmileSonneveld
Copy link
Contributor

The creo catalog can let know by itself that it has 'ARCHIVED' products. As I understand, the current check will log them if they are encountered. However, something goes wrong, and 'ONLINE' products are logged instead.

@jdries
Copy link
Contributor Author

jdries commented Nov 10, 2023

I disabled missing product check on CDSE for now. We'll need to investigate the above mentioned issue where 'ONLINE' products are logged as missing.

It is also very much the question if this procedure really finds all products for which no L2A product exists. It seems that we only find the ones that have been archived, but not the ones that were never in the catalog in the first place.

@soxofaan
Copy link
Member

FYI:

if method == "creo":
creo_catalog = CreoCatalogClient(**check_data["creo_catalog"])
missing = [p.getProductId() for p in creo_catalog.query_offline(**query_kwargs)]

def query_offline(self, start_date, end_date, ulx=-180, uly=90, brx=180, bry=-90,cldPrcnt=100.) -> List[CreoCatalogEntry]:
return [
p for p in self.query(start_date=start_date, end_date=end_date, ulx=ulx, uly=uly, brx=brx, bry=bry,cldPrcnt=cldPrcnt)
if p.getStatus() == CatalogStatus.ORDERABLE
]

Our current implementation parses "AVAILABLE" vs "ORDERABLE" as follows

def _parse_product_ids(response) -> List[CreoCatalogEntry]:
result = []
for hit in response['features']:
# https://creodias.eu/eo-data-finder-api-manual:
# 31 means that product is orderable and waiting for download to our cache,
# 32 means that product is ordered and processing is in progress,
# 34 means that product is downloaded in cache,
# 37 means that product is processed by our platform,
# 0 means that already processed product is waiting in our platform
if hit['properties']['status'] in {0, 34, 37}:
result.append(
CreoCatalogEntry(hit['properties']['productIdentifier'].replace('.SAFE', ''), CatalogStatus.AVAILABLE))
else:
result.append(
CreoCatalogEntry(hit['properties']['productIdentifier'].replace('.SAFE', ''), CatalogStatus.ORDERABLE))

note that this terminology is different from the "ARCHIVED" and "ONLINE" you are talking about

@EmileSonneveld EmileSonneveld linked a pull request Nov 13, 2023 that will close this issue
@EmileSonneveld
Copy link
Contributor

So the offline products where already removed when deduplicating products with the scala code.
I changed it to take the difference between offline and online products per creodias space/time key.

@soxofaan Does that sound ok? Otherwise, I can also just remove the check there

@soxofaan
Copy link
Member

I don't completely understand what you say here, and I don't know these catalogue details well enough to be honest.

I had a look at PR #584 as well and I'm confused that missing = offline - online, isn't it just missing = offline? I mean how can a tile be both online and offline?

@EmileSonneveld
Copy link
Contributor

Tiles can have multiple versions, typical a difference processingBaseline. It looks like a whole bunch of old processingBaseline tiles have been archived and appear offline now. They still show up in the catalogue, because users can manually request them from the archieve.

With this fix, I assume that if a tile (a location and date) has an offline and an online version, we pick the online one and we are ok.
An other option would be to do a single catalogue request, and keep track of the "status" property. If after deduplication, there is still an offline tile, we give the warning. (Needs changes in scala then)

@soxofaan
Copy link
Member

Ok thanks for that explanation, I guess it's worth to put a comment about this in the implementation.

what I also find confusing (again, I'm not that familiar with the creodias catalogue api), is that we pre-define these status values "available", "orderable" and "not found" here

class CatalogStatus(Enum):
NOT_FOUND=1
AVAILABLE=2
ORDERABLE=3

while you talk about "offline" and "online". Is there some documented listing we (can) follow? There is this reference but that link is dead:
# https://creodias.eu/eo-data-finder-api-manual:
# 31 means that product is orderable and waiting for download to our cache,
# 32 means that product is ordered and processing is in progress,
# 34 means that product is downloaded in cache,
# 37 means that product is processed by our platform,
# 0 means that already processed product is waiting in our platform

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants