-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does interp() work on curvilinear grids (2D coordinates) ? #2281
Comments
One way I can think of to make But this is absolutely too convoluted... Updated: see Gridded with Scipy for a similar idea. |
Thanks, @JiaweiZhuang Not yet. I am happy to see curvilinear interpolation in xarray if we could find a good general API for N-dimensional array. For curvilinear interpolation, we may have some arbitrariness, dr_out = dr.interp(xc=lon) the resultant dimension is not well determined. Maybe we need some limitation for the arguments. |
I think we could make |
I guess it is not an API design problem yet... The algorithm is not here since
My concern with Utilizing PS: I have some broader concerns regarding interp vs xESMF: JiaweiZhuang/xESMF#24 |
I'd like to figure out interfaces that make it possible for external, grid aware libraries to extend indexing and interpolation features in xarray. In particular, it would be nice to be able to associate a "grid index" used for caching computation that gets passed on in all xarray operations. |
Great thread by @JiaweiZhuang! I just posted a question on stackoverflow about this exact problem. After hours navigating through the It is surprising that a package targeting n-dimensional gridded datasets (particularly those from the geo/climate sciences) does not handle such a common task with spatial gridded data. The problem on hand is this: I have two 3d arrays with different dimensions defined by 2d coordinates, all I want is to regrid the first cube onto the second. Is there a way to perform this operation with This is what I've tried (which @JiaweiZhuang explained why it doesn't work):
Here |
I am not aware of a ND mesh interpolation algorithm. However, my package xarray_extras [1] offers highly optimized 1D interpolation on a ND hypercube, on any numerical coord (not just time). You may try applying it 3 times on each dimension in sequence and see if you get what you want - although performance won't be optimal. [1] https://xarray-extras.readthedocs.io/en/latest/ Alternatively, if you do find the exact algorithm you want, but it's for numpy, then applying it to xarray is simple - just get DataArray.values -> apply function -> create new DataArray from the output. |
@crusaderky I don't think we need a "proper" 3d interpolation in most cases (i.e. predicting each 3d grid node considering all dimensions simultaneously). If you see my example above, The main limitation here, however, is being able to interpolate over the spatial coordinates when these are defined as 2d arrays. I'll check your package... thanks! |
@fspaolo I never tried using my algorithm to perform 2D interpolation, but this should work:
Add k=1 to downgrade from cubic to linear interpolation and get a speed boost. You can play around with dask to increase performance by using all your CPUs (or more with dask distributed), although you have to remember that an original dim can't be broken on multiple chunks when you apply splrep to it:
where TCHUNK and SCHUNK are integers you'll have to play with. The rule of thumb is that you want to have your chunks 5~100 MBs each. If you end up finding out that chunking along an interpolation dimension is important for you, it is possible to implement it with dask ghosting techniques, just painfully complicated. |
@crusaderky It doesn't work. First, it tells me that if The only interpolation that works (both with Why is it so hard to perform an interpolation with spatial coordinates defined with 2D variables?! I would think this is a pretty common operation on climate datasets... |
@JiaweiZhuang your approach doesn't work either. After installing you package and the dependencies... and following the documentation, I got
An MPI error?! What bothers me are statements like this "For usability and simplicity"... |
@fspaolo 2d mesh interpolation and 1d interpolation with extra "free" dimensions are fundamentally different algorithms. Look up the scipy documentation on the various interpolation functions available. I don't understand what you are trying to pass for x_new and y_new and it definitely doesn't sound right. Right now you have a 3d DataArray with dimensions (x, y, t) and 3 coords, each of which is a 1d numpy array (e.g. |
@crusaderky what I'm trying to do is what the title of this opened thread says "Does interp() work on curvilinear grids (2D coordinates)?" Working with (x, y, t) all 1D numpy arrays is an easy problem! But that's not the problem I'm dealing with. Let me state exactly what my problem is. I have two data cubes,
All I want is to regrid Also note the inverted time dimension between So how to perform this operation... or am I missing something? |
Sorry, i don't think there's an easy way to do this directly in xarray right now.
Thinking a little more about this, I wonder if this the performance could actually be OK as long as the spatial grid is not too big, i.e., if we reuse the same grid many times for different variables/times. In particular, SciPy's griddata either makes use of a |
@fspaolo Could you post a minimal reproducible example on xESMF's issue tracker? Just to keep this issue clean. The error looks like an ESMF installation problem that can happen on legacy OS, and it can be easily fixed by Docker or other containers.
Just a side comment: This is a common but highly non-trivial task... Even small edges cases like periodic longitudes and polar singularities can cause interesting troubles. Otherwise I would just code up an algorithm in Xarray from scratch instead of relying on a heavy Fortran library. But things will get improved over time... |
@fspaolo sorry, I should have taken more time re-reading the initial post. No, xarray_extras.interpolate does not do the kind of interpolation you want. Have you looked into scipy? https://docs.scipy.org/doc/scipy/reference/interpolate.html#multivariate-interpolation xarray is just a wrapper, and if scipy does what you need, it's trivial to unwrap your DataArray into a bunch of numpy arrays, feed them into scipy, and then re-wrap the output numpy arrays into a DataArray. |
I did not test it but this looks like what you want:
I read above that you have concerns about performance as the above does not understand the geometry of the input data - did you run performance tests on it already? [EDIT] you will probably need to break down your problem on 1-point slices along dimension t before you apply the above. |
@shoyer and @crusaderky That's right, that is how I was actually dealing with this problem prior trying xarray ... by flattening the grid coordinates and performing either gridding (with scipy's
This is important information. For the record, here is so far what I found to be best performant:
Performance is not that bad... for ~150 time steps and ~1500 nodes in x and y it takes less than 10 min. I think this can be sped up by computing the interpolation weights between grids in the first iteration and cache them (I think xESMF does this). |
@crusaderky I also did the above using
So |
The naive implementation of splines involves inverting an N x N matrix where N is the total number of grid points. So it definitely is not a very scalable technique. |
@fspaolo where does that huge number come from? I thought you said you have 1500 nodes in total. Did you select a single point on the t dimension before you applied bisplrep? Also, (pardon the ignorance, I never dealt with geographical data) what kind of information does having your lat and lon being bidimensional convey? Does it imply |
2665872 is roughly 1600^2.
I think this is true sometimes but not always. The details depend on the geographic projection, but generally a good mesh has some notion of locality -- nearby locations in real space (i.e., on the globe) should also nearby in projected space. Anyways, as I've said above, I think it would be totally appropriate to build routines resembling scipy's griddata into |
Not in total, I meant ~1500 on each x/y dimension (1500 x 1500). To be precise:
Yes.
It does in my case, but |
Yes, if we cache the Delaunay triangulation we could probably do the entire
thing in about the time it currently takes to do one time step.
…On Thu, May 30, 2019 at 10:50 AM Fernando Paolo ***@***.***> wrote:
@shoyer <https://github.com/shoyer> and @crusaderky
<https://github.com/crusaderky> That's right, that is how I was actually
dealing with this problem prior trying xarray ... by flattening the grid
coordinates and performing either *gridding* (with scipy's griddata) or
*interpolation* (with scipy's map_coordinate) ... instead of performing
proper *regridding* (from cube to cube without having to flatten
anything).
As a rule of thumb, any fancy algorithm should first exist for numpy-only
data and then potentially it can be wrapped by the xarray library.
This is important information.
For the record, here is so far what I found to be best performant:
import xarray as xr
from scipy.interpolate import griddata
# Here x/y are dummy 1D coords that wont be used.
da1 = xr.DataArray(cube1, [('t', t_cube1) , ('y', range(cube1.shape[1])), ('x', range(cube1.shape[2]))])
# Regrid t_cube1 onto t_cube2 first since time will always map 1 to 1 between cubes.
# This operation is very fast.
print('regridding in time ...')
cube1 = da1.interp(t=t_cube2).values
# Regrid each 2D field (X_cube1/Y_cube1 onto X_cube2/Y_cube2) one at a time
print('regridding in space ...')
cube3 = np.full_like(cube2, np.nan)
for k in range(t_cube1.shape[0]):
print('regridding:', k)
cube3[:,:,k] = griddata((X_cube1.ravel(), Y_cube1.ravel()),
cube1[k,:,:].ravel(),
(X_cube2, Y_cube2),
fill_value=np.nan,
method='linear')
Performance is not that bad... for ~150 time steps and ~1500 nodes in x
and y it takes less than 10-15 min.
I think this can be sped up by computing and saving the interpolation
weights between grids in the first iteration and cache them (I think xESMF
does this).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2281?email_source=notifications&email_token=AAJJFVQNLZ3SUTY2WIMI3E3PYAHWVA5CNFSM4FJQZDP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWTATPA#issuecomment-497420732>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAJJFVV5OXUPV25D64WJEZTPYAHWVANCNFSM4FJQZDPQ>
.
|
Thanks for this interesting discussion. I'm currently at the point where I'm moving interpolation functions to xarray based workflow. While trying to wrap my head around this I found that this involves not only interpolation but also indexing (see #1603, #2195, #2986). Sorry if this might exceed the original intention of the issue. But it is my real use case (curvelinear grids to cartesian). citing @shoyer's comments for convenience
Our interpolators are build upon scipy's cKDTree. They are created once for some source and target grid configuration and then just called with the wanted data. The interpolator is cached in the dataset accessor for multiple use. But this makes only sense, if there are multiple variables within this dataset. I'm thinking about how to reuse the cached interpolator for other datasets with the same source and target configuration. Same would be true for tree-based indexers, if they become available in xarray. My current approach would be to create an xarray dataset I'm sure I can get something working within my workflow using accessors but it would be better fitted in xarray itself imho. |
@kmuehlbauer Thanks for the ping. I don't have time to read this whole thread, but based on your comment I have a few things I'd like to point out. First, the pykdtree package is a good alternative to the scipy kdtree implementation. It has been shown to be much faster and uses openmp for parallel processing. Second, the pyresample library is my main way of resampled geolocated data. We use it in Satpy for resampling, but right now we haven't finalized the interfaces so things are kind of spread between satpy and pyresample as far as easy xarray handling. Pyresample uses SwathDefinition and AreaDefinition objects to define the geolocation of the data. In Satpy the same KDTree is used for every in-memory gridding, but we also allow a I'm hoping to sit down and get some geoxarray stuff implemented during SciPy next week, but usually get distracted by all the talks so no promises. I'd like geoxarray to provide a low level interface for getting and converting CRS and geolocation information on xarray objects and leave resampling and other tasks to libraries like pyresample and rioxarray. |
Have there been any updates on the handling of multi-dimensional co-ordinates in xarray, in particular interpolation/indexing using such co-ordinates, as discussed here? For geographical data with curvilinear grids (using latitudes and longitudes) this issue can become a real headache to deal with. |
I find xarray so useful in many ways, for which I'm grateful. But there are some current limitations that force me to hesitate before recommending it to colleagues. One is this issue - lack of support (or rather, I suspect simply no "clearly stated support"?) for curvilinear coordinate systems, which are pretty much ubiquitous in the work I do. The other issue which causes me to pause before recommending xarray wholeheartedly is the complexity (and frequent slowness and errors, still - all previously reported) in dealing with GRIB2 file formats that include multiple vertical coordinate systems (e.g., products from the NCEP Unified Post Processing System - UPP). But that's an issue for another thread... Any movement on wrapping scipy griddata (or some suitably more sophisticated scipy tool) within xarray's interface? |
I am evaluating
interp()
against xESMF. Here's how xESMF would regrid the built-in'rasm'
data to a regular lat-lon grid.Seems like
interp()
can convert rectilinear grids (1D coordinates) to curvilinear grids (2D coordinates), according to the last example. How about the reverse? Can it convert 2D coordinate to 1D, or to another 2D coordinate?That's the test data:
That's a simple destination grid:
I would expect a syntax like:
But I got
ValueError: dimensions ['xc', 'yc'] do not exist
, becauseinterp()
only accepts dimensions (which must be 1D), not coordinates.dr.interp(x=lon, y=lat)
runs but the result is not correct. This is expected becausex
does not mean longitude in the original data.@crusaderky @fujiisoup
The text was updated successfully, but these errors were encountered: