Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

correctness of area_in_square_meters #144

Closed
soxofaan opened this issue Oct 12, 2022 · 5 comments · Fixed by Open-EO/openeo-geopyspark-driver#372
Closed

correctness of area_in_square_meters #144

soxofaan opened this issue Oct 12, 2022 · 5 comments · Fixed by Open-EO/openeo-geopyspark-driver#372
Assignees

Comments

@soxofaan
Copy link
Member

While working on the failing integration tests due to #141 I was covering my changes with tests and have a question about current area_in_square_meters implementation:

def area_in_square_meters(geometry: BaseGeometry, crs: Union[str, pyproj.CRS]):
if isinstance(crs, str):
crs = "+init=" + crs # TODO: this is deprecated
geometry_lat_lon = geometry_to_crs(geometry,crs,pyproj.crs.CRS.from_epsg(4326))
geometry_area = shapely.ops.transform(
partial(
pyproj.transform,
pyproj.Proj(crs),
pyproj.Proj(
proj='aea',
lat_1=geometry_lat_lon.bounds[1],
lat_2=geometry_lat_lon.bounds[3])
),
geometry)
return geometry_area.area

I added a test in dad6bc9 where I use a GeoJSON with a square with side 2.3km (based on length of Watersportbaan Gent): area is around (2.3 km)^2 = 5.29 km^2, but current implementation of area_in_square_meters gives a result around 8 km^2, which is quite a difference:

def test_get_bounding_box_area(self):
path = str(get_path("geojson/FeatureCollection06.json"))
vc = DriverVectorCube(gpd.read_file(path))
area = vc.get_bounding_box_area()
# TODO: area of FeatureCollection06 (square with side of approx 2.3km, based on length of Watersportbaan Gent)
# is roughly 2.3 km * 2.3 km = 5.29 km2,
# but current implementation gives result that is quite a bit larger than that
numpy.testing.assert_allclose(area, 8e6, rtol=0.1)

@Pratichhya
Copy link

I was testing a service for 1 hectare area whose area changes in a different manner(decreases in area, i.e. 2875sq.m). Similar was the case when tested for 100 hectares.
job id: 'j-b8ec79678ef34937bef08c3ef1404187'

@Pratichhya
Copy link

I retried testing service for a specific area, yet it seems different than expected. E.g.: when tested for a 1ha polygon, it's area was changed to 13089.722 sq_m instead of 10000 sq_m (job id: j-8d087ab0f16a47c18bc4198efb52290c), similarly for a polygon on 10ha area was 550590.284 sq_m (job id: j-6a0ee095fc71415a9fb5b44856c17293).

@Pratichhya Pratichhya reopened this Mar 22, 2023
@JeroenVerstraelen
Copy link
Contributor

Hello, @Pratichhya. Can you share the code and polygons you used? I'll check it as soon as I can.

@JeroenVerstraelen
Copy link
Contributor

@Pratichhya, I have tested the new area_in_square_meters function with your 1ha polygon, and it returned a value of 10,155.584 m². This result is identical to what we get on QGIS.

The issue stems from the way we handle aggregate_spatial in the dry_run. In that section, we use the bounding box (bbox) of the provided polygon as our filter:
https://github.com/Open-EO/openeo-python-driver/blob/master/openeo_driver/dry_run.py#L503
This bbox is later utilized to compute the area, which is then recorded in the job_metadata.json file:
https://github.com/Open-EO/openeo-geopyspark-driver/blob/master/openeogeotrellis/deploy/batch_job.py#L97

The difference is quite evident in QGIS:
Screenshot from 2023-03-23 12-25-22

The bbox has an area of approximately 20,000 m². However, considering the extra vector_buffer in your Anomaly_Detection UDP, this likely results in the 13,089.722 m² value that you observe for your batch job.

@soxofaan @jdries, is it intentional that we use the bbox of the provided polygons for our area calculation in job_metadata.json?

@jdries
Copy link
Contributor

jdries commented Mar 23, 2023

For weak_spatial_extent we probably want the bounding box, but in batch job metadata, polygons should be used to calculate area if available because openEO processing is normally smart enough to work on the polygons only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants