-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API design for pointwise indexing #475
Comments
So, the good news is that once we figure out the API for pointwise indexing, I think the nearest-neighbor part could be as simple as supplying The challenge is that we want to go from an DataArray that looks like this:
To one that looks like that:
Somehow, we need to figure out the name for the new dimension ( My thought would be to have methods If you don't already have 1D xray objects, I suppose we could also allow |
Seems like if your method is going to be named One thing to keep in mind is that for many of us the "nearest-neighbor" part isn't really |
Yes, this is a reasonable choice for the case of 1d indexers.
This is also a good idea, though I would probably call the parameter
Indeed. As a start, we should be able to do nearest neighbor lookups with a tolerance soon -- I have a pandas PR that should add some of that basic functionality (pandas-dev/pandas#10411). In the long term, it would be useful to have some sort of representation of grid cells in the index itself, possibly something similar to |
I like: DataArray.isel_points(x=[1, 2, 3], y=[0, 1, 2], dim='points') I also like the nearest-neighbor / resample API of: DataArray.sel_points(lon=[-123.25, -140.0, 72.5], lat=[45.0, 72.25, 65.75],
dim='points', method='nearest') How do we want to do the nearest-neighbor selection? The simplest case would be to follow the cKDTree example from #214. However, when you're using lat/lon coordinates, it is usually best to map these coordinates from the spherical coordinates to a Cartesian coordinates (see here for a simple example using cKDTree. Is that a road we want to go down here? Further along that subject, but not directly relate - has anyone used pyresample. |
Unidata also has a blog post benchmarking cKDTree and other methods and concludes "Your Mileage May Vary". I'd probably just go with a KDTree, but something to aware of. |
There is a great kdtree-based geospatial resampling package you might want to consider building on: |
Maybe this is off topic, but are the plans to support more general spatial resampling / regridding? Like if I have two DataArrays a and b with different spatial coords, it would be great to be able to do c = a.regrid_like(b) This is a pretty common practice in climate science, since different datasets are provided on different grids with different resolutions. |
I agree that regridding and resample would be very nice, and pyresample looks like a decent option. I have no immediate plans to implement these features but contributions would be very welcome. For n-dimensional indexing, kdtree seems sensible, especially if we can cache it on the coordinates. We probably want an explicit API for methods that add new coordinates -- something like |
As a first step, I'll volunteer (unless someone else is more keen on doing this work) to put together a pull request for After that, we'll want to add the Later on, I'm also interested in regridding and resampling between grids - let's open another issue for that. Maybe we use |
@jhamman it would be great if you could put together a PR for As for |
Good point on the dask array business. From the dask docs:
So, from browsing the closed dask issues, it seems like dask has similar support for multi-dimension slicing and indexing as xray. This throws a bit of a wrench in my plan for how I was going to implement I'll have to put a bit more thought into this. Any suggestions on how to index the dask array without looping through individual points would be great. |
For now, I actually think selecting individual points and then concatenating the resulting arrays together would be a reasonable start. Yes, it's kind of slow, but once you have a first draft put together that way with the right API we can optimize later. |
Now that the |
I would start with the easiest case -- lookups of 1d orthogonal arrays, e.g., For 2D lookups, we need a KDTree. Here are some API ideas, just tossing things around...
|
I started playing around with making an array wrapper for KDTree this evening: I think it has most of the necessary indexing machinery and you can put it in an xray.Dataset like an array. You could easily imagine hooking in a |
Very nice. This is the sort of API I was hoping for. It will be a while before I can come back around on this. In the meantime, if someone else wants to take the |
PR #507 implements the my suggested 1d version of |
A few recent developments relevant to this issue:
So I'm now thinking an API more like this:
For building a tree with lat/lon remapped to spherical coordinates, we should write a method that converts lat and lon arrays into a tuple of x, y, z arrays (e.g., using |
Without following the discussion in detail, what is the status here? In particular, I would like to do pointwise selection on multiple 1D coordinates using multidimensional indexer arrays. I can do this with the current
Given this conceptually easy but somewhat tedious procedure, couldn't that be something that could quite easily be implemented into the current |
@burnpanck I don't think you need to do the flattening/multi-index bit. I believe At this point we're really just talking about design refinements (I'll rename the topic). |
Really? I get a |
@burnpanck Nevermind, you are correct! I misread your comment. This cannot be done currently. You certainly could try to put this into |
So, what has become the consensus for performing regridding/resampling? I see a lot of suggestions, but I have no sense of what is mature enough to use in production-level code. I also haven't seen anything in the documentation about this topic, even if it just refers people to another project. |
Short answer. We don't have a tool that is production ready. Longer answer: This issue introduces the concept of point-wise indexing using nearest neighbor lookups on ND coordinates. @shoyer has an example implementation here but it hasn't moved forward in quite a while. |
Yeah, we need to move something forward, because the main benefit of xarray is the ability to manage datasets from multiple sources in a consistent way. And data from different sources will almost always be in different projections. My current problem that I need to solve right now is that I am ingesting model data that is in a LCC projection and ingesting radar data that is in a simple regular lat/lon grid. Both dataset objects have latitude and longitude coordinate arrays, I just need to get both datasets to have the same lat/lon grid. I guess I could continue using my old scipy-based solution (using map_coordinates() or RectBivariateSpline), but at the very least, it would make sense to have some documentation demonstrating how one might go about this very common problem, even if it is showing how to use the scipy-based tools with xarrays. If that is of interest, I can see what I can write up after I am done my immediate task. |
Yes, a documentation example would be greatly appreciated. We have been
making progress in this direction (especially with the new vectorised
indexing support) but it has been slow going to do it right.
…On Tue, Nov 7, 2017 at 10:29 AM Benjamin Root ***@***.***> wrote:
Yeah, we need to move something forward, because the main benefit of
xarray is the ability to manage datasets from multiple sources in a
consistent way. And data from different sources will almost always be in
different projections.
My current problem that I need to solve right now is that I am ingesting
model data that is in a LCC projection and ingesting radar data that is in
a simple regular lat/lon grid. Both dataset objects have latitude and
longitude coordinate arrays, I just need to get both datasets to have the
same lat/lon grid.
I guess I could continue using my old scipy-based solution (using
map_coordinates() or RectBivariateSpline), but at the very least, it would
make sense to have some documentation demonstrating how one might go about
this very common problem, even if it is showing how to use the scipy-based
tools with xarrays. If that is of interest, I can see what I can write up
after I am done my immediate task.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#475 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1rw8D01Zw5-EPR21CkrYUYchh-5_ks5s0KF4gaJpZM4FYzk7>
.
|
ping @stefanomattia who seems to be interested in the KDTreeIndex concepts described in this issue. |
Subscribers to this thread will probably be interested in @JiaweiZhuang's recent progress on xESMF. That package is now a viable solution for 2D regridding of xarray datasets. |
Thanks @jhamman, I'd love to contribute! I'm not that confident in my Python skills, but maybe with a little guidance? Let me know if or how I could help. |
@stefanomattia - I'd be happy to provide guidance and even to contribute to some of the development. Based on your blog post, I think you may be well on your way. |
@jhamman @stefanomattia can you share a link to this blog post? :) |
That post must look a bit amateurish, I reckon, but if you guys think it could be a starting point for a KD-tree search implementation in xarray, I would be thrilled to contribute! There is no learning without trying, after all. I could start from #475 (comment). @jhamman maybe you could send me an email with a few requirements? |
Note that it will probably be easier to implement such KDTreeIndex after having refactored indexes and multi-indexes in xarray (see #1603). I think this refactoring would represent a good amount of work, though, so maybe we can do it after if you don't want to wait too long for the KD-Tree feature? |
Further to the comment I made in a related issue #486 comment I've now taken a simplified version of the collocation approach in CIS and created a stand-alone package which works with xarray objects: https://github.com/cistools/collocate. This works essentially the same as the nice example shown in the above blog, with some key differences:
I'll try and put together a notebook building on the above blogpost so that the similarities and differences are a bit clearer. I'm not familiar enough with xarray indexing to be able to say how well this would fit inside xarray, but hopefully it will be useful before we're able to crack KD-MultiIndexes! |
In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here or remove the |
@JimmyGao0204 I moved your comment to a new issue: #4090 |
There hasn't been much activity here since quite some time. Meanwhile, there has been the development of the xoak package that supports point-wise indexing of Xarray objects with various indexes (either generic like With the forthcoming Xarray release, it will be possible to create and assign custom indexes to DataArray / Dataset objects. The plan for |
Can we close this issue and redirect the reader to https://github.com/xarray-contrib/xoak or #7041? Or is there still a need to extend Xarray's API for supporting pointwise indexing, i.e., something that cannot be done with |
There have been a number of threads discussing possible improvements/extensions to
xray
indexing. The current indexing behavior forisel
is orthogonal indexing - in other words, each coordinate is treated independently (see #214 and #411 for more discussion).So the question: what is the best way to incorporate diagonal or pointwise indexing in
xray
? I see two main goals / applications:numpy
style integer array indexingInput from @WeatherGod, @wholmgren, and @shoyer would be great.
The text was updated successfully, but these errors were encountered: