Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

people-EA: vector to raster #423

Closed
jdries opened this issue May 16, 2023 · 5 comments · Fixed by Open-EO/openeo-python-driver#218 or #493
Closed

people-EA: vector to raster #423

jdries opened this issue May 16, 2023 · 5 comments · Fixed by Open-EO/openeo-python-driver#218 or #493
Assignees

Comments

@jdries
Copy link
Contributor

jdries commented May 16, 2023

given vector data cube with attributes
convert to raster
use attribute property to burn in the pixel value

Open-EO/openeo-processes#442

@jdries
Copy link
Contributor Author

jdries commented May 22, 2023

Geotrellis 'rasterize' method already exists on RDD[Feature]
Question is how to get such an RRD based on the GeoDataframe that we have in python

We could serialize to geoparquet and read again from spark? That would also work for large files.

@JeroenVerstraelen
Copy link
Contributor

The main implementation is done. I am running into an issue during testing though.

Issue: CRS in layercatalog.json layers can be UTM, which means the EPSG code needs to be calculated from the extent. This is an extra complexity in vector_to_raster that I'd like to avoid.

Possible solutions:

  • I want to pass in target_datacube.pyramid.levels[0].metadata (TileLayerMetadata) into VectorCubeMethods.scala::VectorToRaster
    • But I can't be sure that the target_datacube will always have a TileLayerMetadata (e.g. create_data_cube() process)
  • Add the complexity anyway
    • Simply convert target_raster_cube.metadata.spatial_dimensions[0].crs to a valid EPSG code
    • I am going for this solution

Example CRS in target_raster_cube.metadata.spatial_dimensions[0].crs:

{
    "$schema": "https://proj.org/schemas/v0.2/projjson.schema.json",
    "type": "GeodeticCRS",
    "name": "AUTO 42001 (Universal Transverse Mercator)",
    "datum": {
        "type": "GeodeticReferenceFrame",
        "name": "World Geodetic System 1984",
        "ellipsoid": {
            "name": "WGS 84",
            "semi_major_axis": 6378137,
            "inverse_flattening": 298.257223563
        }
    },
    "coordinate_system": {
        "subtype": "ellipsoidal",
        "axis": [{
                     "name": "Geodetic latitude",
                     "abbreviation": "Lat",
                     "direction": "north",
                     "unit": "degree"
                 }, {
                     "name": "Geodetic longitude",
                     "abbreviation": "Lon",
                     "direction": "east",
                     "unit": "degree"
                 }]
    },
    "area": "World",
    "bbox": {
        "south_latitude": -90,
        "west_longitude": -180,
        "north_latitude": 90,
        "east_longitude": 180
    },
    "id": {
        "authority": "OGC",
        "version": "1.3",
        "code": "Auto42001"
    }
}

GeoPySparkLayerCatalog.native_crs(metadata) converts this to "UTM".
utm.auto_utm_epsg_for_geometry(box(west, south, east, north), srs) derives an EPSG code using the bbox from the provided geometry.

@JeroenVerstraelen
Copy link
Contributor

Another issue is how much information should be extracted from target_datacube. Currently I want to only extract the crs and target_resolution from DriverDataCube.CollectionMetadata and calculate the extent and layoutdefinition using the geometries provided in the input vector cube. I feel like this keeps the process simple and easy to understand.

There was a discussion on that we could take the extent, layoutdefinition from the target_datacube, and only rasterize the geometries in that extent.

JeroenVerstraelen added a commit to Open-EO/openeo-python-driver that referenced this issue Aug 28, 2023
JeroenVerstraelen added a commit to Open-EO/openeo-python-driver that referenced this issue Aug 28, 2023
JeroenVerstraelen added a commit to Open-EO/openeo-python-client that referenced this issue Aug 28, 2023
JeroenVerstraelen added a commit to Open-EO/openeo-python-client that referenced this issue Aug 28, 2023
@JeroenVerstraelen
Copy link
Contributor

JeroenVerstraelen commented Aug 31, 2023

After discussion we decided to:

  • Use target_datacube.pyramid.levels[0].metadata to extract the crs and resolution. The reasoning for this is because we can't trust the python metadata to be completely up-to-date.
  • Use the extent from the target_data_cube rather than the bounds of the vector cube, this is to avoid alignment issues when merging with other raster cubes.

Features left to implement:

  • Use extent from target_data_cube instead of vector cube bounds
  • Raise exception when vector cube contains more than one property

@JeroenVerstraelen
Copy link
Contributor

JeroenVerstraelen commented Sep 4, 2023

All features have been implemented. The example notebook is also complete and has been sent to Marcel.

There are two TODOs for future improvements:

  • Implement raster_to_vector for multiple bands.
  • geotrellis.spark.rasterize.RasterizeRDD.fromKeyedFeature causes exectors to go out of memory for large extents.
    • This appears to be because it creates an EmptyTile with (layout.tileCols, layout.tileRows) dimensions for each feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants