Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Point extractions sentinel 2 #58

Closed
wants to merge 20 commits into from
Closed

Conversation

VincentVerelst
Copy link
Contributor

@VincentVerelst VincentVerelst commented Jun 12, 2024

Some comments from @kvantricht to be followed up here:

  • "date" has dtype object (typical for strings in dataframes); isn't it possible to save it immediately in datetime format? Especially if we want to filter temporally afterwards, we'd need it like that
  • what's the use of feature_index column?
  • the bands are currently in float64, while the scaling in gfmap was designed to be able to save as UINT16; if we don't have the option to tell openEO to save it like that, we should make the conversion in a post-job action to avoid unnecessary storage. Have to make sure in this case the nodata value of 65535 was correctly used.
  • "geometry" is in WKB format (i think). Can we easily interact with it like this? E.g. I can not load this parquet with geopandas as it does not understand the geometry. Actually it should understand WKB for geoparquet but only with the correct metadata i think.
  • out of curiosity i'm wondering why also samples with extract==False are extracted. I guess for this exercise you're extracting everything? I don't think it's a necessary column in the output but ok.
  • where do irrigation_label, croptype_label and landcover_lavel come from now? Directly from the input file? This wasn't rasterized and exported as STAC compatible collection so i'm wondering how we get this info in here. Unfortunately still with the old mappings for crop type (cc Christina Butsko). irrigation label is not needed in our parquet files anyway

@VincentVerelst VincentVerelst removed the request for review from GriffinBabe June 13, 2024 12:29
@VincentVerelst
Copy link
Contributor Author

Updated the script. A new output file can be found at /data/users/Public/vincent.verelst/extraction_test/35NPF/2018_SSD_WFP-field-survey_POLY_110_1674.parquet

As a general remark: the previous extraction was run on an old reference dataset, just as a test. The current extraction has been run on a new reference dataset.

@kvantricht, to address your comments in the order as listed above:

  • Changed the dtype of the 'date' column to 'datetime' in the post_job_actions
  • The feature_index is given by OpenEO automatically. Each geometry to be extracted gets its own index to keep track of it. In the example above, there were 21 Points to extract, so 21 different feature_indices
  • The output should indeed be uint16. I've contacted the openeo devs to check why this is not the case. As a temporary fix I've converted manually to uint16 in the post_job_actions
  • Could you try reading the new result with geopandas? It works for me
  • Extract flags are given by 0 and 1 (instead of False, True) in the newest format of reference datasets. In the newest example, only features with extract==1 are extracted
  • All other columns in the geoparquet file are directly taken over from the original reference dataset. These point extractions are directly run on the original reference datasets, not on the S2 patch extractions.

@kvantricht
Copy link
Contributor

@VincentVerelst thanks for the changes!

the fact that we're having weird and/or duplicate attributes like [xx, sampleID, sample_id, irrigation_status, IRRIGATION, CROP, ID, ARMYWORM, CT, IMPACT, None, ...] is all due to the input file? If yes, @cbutsko, we need to check this come up with a list of attributes we want to subset on each time knowing that these make it into the output parquet files.

reading with geopandas works for me too, thanks.

so looking good for me!

backend_context = BackendContext(backend)

# TODO: Adjust this to the desired bands to download
bands_to_download = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to expand to S1 and meteo too. Has to reflect the inputs required by the models.

scripts/extractions/point_extractions/point_extractions.py Outdated Show resolved Hide resolved
@VincentVerelst
Copy link
Contributor Author

@kvantricht all the attributes are indeed also present in the input file. The reference data used for this example can be found at /data/users/Public/vincent.verelst/extraction_test/08_2018_SSD_WFP-field-survey_POLY_110.geoparquet.

@VincentVerelst VincentVerelst marked this pull request as ready for review June 20, 2024 14:15
@VincentVerelst VincentVerelst marked this pull request as draft June 20, 2024 14:20
@kvantricht kvantricht closed this Jul 9, 2024
@kvantricht kvantricht deleted the vv_point_extractions branch July 9, 2024 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants