Clarification of benchmarks #4

martinfleis · 2022-02-22T14:04:17Z

Hi,

I'll make a PR changing some of the geopandas benchmarks to more performant versions but before that I'd like to ask for some clarifications. I understand that the benchmarks are artificial but before I'll start coding I want to make sure I understand what the main goal is.

distance
- you are trying to get a NxN matrix with pairwise distance between all points (both ways?), right?
sample
- I truly don't understand what is this trying to do :D. Are you trying to get n random points that are within the polygon? Sort-of Monte Carlo simulation?

I think I understand the rest.

The text was updated successfully, but these errors were encountered:

martinfleis · 2022-02-22T20:40:15Z

I think I got it. See #5

kadyb · 2022-02-22T22:07:47Z

Personally, I wanted to focus on comparing the functions available in packages from a user's perspective, rather than writing the most efficient alternatives. I also think we should compare similar functions in terms of features ({sf} as a reference?). I know it's possible to write efficient code using eg. {Rcpp}, {GEOS} and {data.table}, but I think that's beyond the reach of the vast majority of users.

distance
you are trying to get a NxN matrix with pairwise distance between all points (both ways?), right?

Exactly!

sample
I truly don't understand what is this trying to do :D. Are you trying to get n random points that are within the polygon? Sort-of Monte Carlo simulation?

Not quite sort of Monte Carlo simulation. I think sampling points in polygons is a standard practice in GIS :P Later, the coordinates can be retrieved from these geometries, or they can be used to extract values from the raster. Please check out sf::st_sample() as a reference. Ideally, you would implement this as a function in {geopandas}.

martinfleis · 2022-02-22T22:27:50Z

Personally, I wanted to focus on comparing the functions available in packages from a user's perspective, rather than writing the most efficient alternatives.

Yup, I've used only functions that are available. As you can see from the discussion on intersects, there could be even faster options.

compare similar functions in terms of features ({sf} as a reference?)

As far as I know, the intersects in sf uses spatial index under the hood, that is why I opted to use it as well. But I understand if you ignore that solution :).

Ideally, you would implement this as a function in {geopandas}.

We don't have anything like this right now but the code I used in #5, replacing your custom loop, is likely quite close to how it would look like if we had it (I'll open an issue to add it in future).

kadyb · 2022-02-22T23:05:39Z

As far as I know, the intersects in sf uses spatial index under the hood, that is why I opted to use it as well. But I understand if you ignore that solution :).

My mistake, in that case {geopandas} should also use spatial indexes. Not sure if {terra} works the same way, but I believe it does. Edit: {terra} doesn't use spatial indexes.

By "compare similar functions in terms of features", I meant that the functions in {terra} and {sf} have more options (arguments), so I suspect there will be overhead (but probably negligible) due to conditions/transformations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification of benchmarks #4

Clarification of benchmarks #4

martinfleis commented Feb 22, 2022

martinfleis commented Feb 22, 2022

kadyb commented Feb 22, 2022

martinfleis commented Feb 22, 2022

kadyb commented Feb 22, 2022 •

edited

Loading

Clarification of benchmarks #4

Clarification of benchmarks #4

Comments

martinfleis commented Feb 22, 2022

martinfleis commented Feb 22, 2022

kadyb commented Feb 22, 2022

martinfleis commented Feb 22, 2022

kadyb commented Feb 22, 2022 • edited Loading

kadyb commented Feb 22, 2022 •

edited

Loading