Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checkout pyresample #14

Open
willirath opened this issue Aug 17, 2020 · 6 comments
Open

Checkout pyresample #14

willirath opened this issue Aug 17, 2020 · 6 comments

Comments

@willirath
Copy link
Contributor

As noted by @koldunovn: We should have a look at https://pyresample.readthedocs.io/en/latest/

@benbovy
Copy link
Member

benbovy commented Aug 20, 2020

pyresample uses pykdtree under the hood, which looking at the benchmarks is faster than scipy.spatial.cKDTree. It can support parallel queries (using openmp) but seems to only support euclidean distances.

@benbovy
Copy link
Member

benbovy commented Aug 20, 2020

is faster than scipy.spatial.cKDTree

Maybe not anymore? #12 (comment)

@djhoese
Copy link

djhoese commented May 1, 2021

Hi @willirath and @benbovy. I'm one of the core developers on pyresample (also @mraspaud and @pnuu). I also tried to start the geoxarray library which as far as purpose I hope to be an overlap of xoak, rioxarray (CC @snowman2), and pyresample (depending on and/or supplementing these libraries). Shortly after trying to start the project I realized that holding on to coordinate information in xarray objects/accessors was not as "clean" as I thought it was going to be and I basically gave up for a while. But I'd like to bring the project back up, if not in implementation then at least figure out the concepts that I was originally hoping to cover.

@raybellwaves has pointed xoak out to me a couple times now and I'm very impressed, but I'm really lacking in my understanding of indexers (xarray or pandas). I'm hoping you can clear some things up for me about xoak and the future you see. I've been trying to follow pydata/xarray#4979 and pydata/xarray#5102 and I'm wondering how this work will affect xoak? Would it make things like the xoak accessor's .sel unnecessary? Is xarray interested in collecting a series of indexers like those in xoak? What about the indexer registry and custom indexers? If I want to make an indexer based on pyresample/pykdtree that xarray can use, would you recommend I look at adding it to xoak? Or add it to pyresample and have it register it with xoak? Or just keep it in pyresample (or geoxarray) and have interfaces for users to get to it? How does xoak's indexers perform with thousands of points (ex. 10000 x 6400 - large swath from the VIIRS satellite instrument)?

What I'd really like to get to eventually with geoxarray (or the geospatial xarray community) is indexers that allow for CRS-aware selection/indexing, consistent spatial resampling interfaces, and consistent CRS/projection/coordinate metadata handling and conversion. That's why I'm currently trying to figure out how I can take what I know from Satpy/Pyresample and rioxarray and merge it with the new information I'm gathering from xoak.

Side note: Regarding pykdtree performance, I'd like to see how these benchmarks were run. @mraspaud just recently ran some tests and found similar performance to pykdtree's original findings in its README. There is a chance that if pykdtree was installed in a certain way the OpenMP library wasn't included and used. Also, we've been using pykdtree underneath some dask map_blocks/delayed calls in the Satpy library for resampling and have been slowly moving the functionality to pyresample.

@benbovy
Copy link
Member

benbovy commented May 3, 2021

Hi @djhoese!

Hope to clarify things a bit about Xarray flexible indexes and Xoak here.

We started Xoak as a preliminary project before starting to work on flexible indexes in Xarray. At some point, some of Xoak's specific features will likely reuse Xarray API:

  • We plan to make Xoak indexes compatible with Xarray, i.e., eventually they will inherit from xarray.Index added in Flexible indexes: add Index base class and xindexes properties pydata/xarray#5102.

  • Xoak's index registering system will eventually be dropped in favor of Xarray's index registering system, e.g., based on entrypoints like the newly refactored Xarray I/O backend system.

  • Xoak accessor's .sel might become unnecessary at some point (it will depend on whether Xarray's .sel will support Xoak's features like providing chunked indexers). However, Xoak will still provide an accessor so that we can query indexes for operations beyond simple data selection like, e.g., return the distances to the nearest neighbors, k-nearest neighbor selection, etc.

What I'd really like to get to eventually with geoxarray (or the geospatial xarray community) is indexers that allow for CRS-aware selection/indexing, consistent spatial resampling interfaces, and consistent CRS/projection/coordinate metadata handling and conversion.

That's definitely the goal with Xarray's flexible indexes refactoring. Once this is ready, you should be able to create your own index class (in geoxarray or any package reused in the geospatial xarray community) inheriting from xarray.Index, which would handle the CRS/projection/coordinate (meta)data and provide its own implementation of selection, indexing, alignment (maybe also interpolation, groupby, etc?) for Xarray objects.

I think fully CRS-aware indexes are out of scope for Xoak, which is more focused on "basic" indexes for irregular data (including some indexes for lat/lon coordinates), but maybe such CRS-aware index could be itself built on top of one of the indexes that Xoak provides?

How does xoak's indexers perform with thousands of points (ex. 10000 x 6400 - large swath from the VIIRS satellite instrument)?

That's a good question, we haven't tested Xoak with many kinds of datasets yet. For example, using the s2_point index for lat/lon unchunked data, it's possible to index a few dozens of millions of points (and query a few hundreds of thousand of points) within seconds.

@djhoese
Copy link

djhoese commented May 6, 2021

Xoak's index registering system will eventually be dropped in favor of Xarray's index registering system, e.g., based on entrypoints like the newly refactored Xarray I/O backend system.

Does that mean that this entrypoint system doesn't exist yet for index registration?

@benbovy
Copy link
Member

benbovy commented May 11, 2021

Does that mean that this entrypoint system doesn't exist yet for index registration?

In Xarray indeed there's no API or system yet for registering or even using custom indexes other than pandas.Index (pydata/xarray#5102 has just been merged but it's only internal refactoring for now).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants