-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimentally support CRSIndex #588
Comments
Thanks for making the prototype. That looks pretty nice. Do you think it would be better to wait before officially adding it to rioxarray so some wrinkles can be ironed out first? Or do you think that adding it to rioxarray will help the development process for indexes in xarray? Thoughts:
|
I think this will be a very useful experiment.
IMO it would be good to try this and see what fails in the test suite. That would be a more exhaustive test than my suggestion to pick a single "workflow" example./ |
Sounds like a good plan to me. Anyone who would like to try implementing the CRSIndex and adding it to |
@snowman2 we've finally been making some headway trying this out. Currently testing things on my fork in this PR scottyhq#1, and borrowing from @benbovy's linked xvec implementation and @dcherian's prototype notebook. The example notebook in the PR highlights some use-cases and goals (https://github.com/scottyhq/rioxarray/blob/crsindex/docs/examples/crs_index.ipynb). |
Nice 👍 My first thought is to make |
Could this screw up a downstream package though? Or maybe we want that to shake out all the bugs :P
It may be prudent to be strict now and loosen things later. If fixing the error is simply |
It works as a context manager and so it can be applied locally as well. Though, the current strategy would work as well. And as you mentioned, it would be good to have a way to do a thorough bug search. |
Since it is optionally enabled (and off by default), being strict is fine. |
FYI @scottyhq and I are thinking of hacking on this during the SciPy sprints. let us know if anyone interested in this stuff will be around |
I'm thinking about different use cases that would all benefit from an Xarray CRS-aware index but where the kind of index would greatly differ from one case to another:
Instead of duplicating the CRS-related logic in all those indexes, I'm wondering if alternatively we could provide somewhere a CRS "meta" index so that we can combine it with other indexes like Some open questions:
|
It is not that easy, actually. We need to get the actual CRS information somewhere in order to build the CRSIndex. It is more manageable if each package (xvec, rioxarray, etc.) takes care of that rather than having a common package trying to support all different ways / standards to get that information. Also it is possible that the wrapped index needs to access the CRS for some computation (for example, would it be useful to have something like |
Forgive anything that is obvious to you and isn't to me about xarray Index objects. I've never created one and have never used a custom one. I like your layout of special types of indexes and wrapping (decorating?) them with CRS information. I assume the main reason an index like these would need to know about the CRS is to allow or disallow or adapt to two xarray objects being merged? Like one of the examples in https://github.com/scottyhq/rioxarray/blob/crsindex/docs/examples/crs_index.ipynb where it shows doing You mention odc-geo's Also you mention cyclic dependencies. Are you saying that if CRSIndex when into geoxarray that would lead to a cyclic dependency with rioxarray? |
I find the If we should do subclassing, then maybe the other way? That would allow us to have a common subset of functionality dealing with CRS and custom index handling on top of that. The core class may not even be an index but some form of a |
No worries at all! This is still very experimental and there's not much documentation or reading material about how to create and use them. TBH I also have no clear proposal for a "reusable" CRSIndex yet and I might be missing obvious CRS or other domain-specific things.
Yes I guess that's one of the main reasons. It could be used also for validating inputs given to Dataset (DataArray)
That is the question, actually. I don't know if/how a GeoBox could be used by an index. Support data selection by passing world coordinates and convert it as pixel coordinates (inverse affine transformation)? Support more advanced alignment behavior than allow / disallow? About index vs. accessor, I think it relates to the more general question of whether a GeoBox is unique to a Dataset (DataArray) or is more closely related to a subset of coordinates (and to a less extent how hard / expensive it is to rebuild from scratch).
Sorry the |
What type of validation of inputs did you have in mind for
If an Index is used by multiple xarray objects, is it the same instance of that index or does it get copied? I see a GeoBox or any other type of polygon as being a representation of the data coordinates, but separate from the Index. You could have an Index that uses a GeoBox internally I suppose. You could also have shapely polygons produced in various CRSes which is why I felt like you need access to the full DataArray coordinates and possibly attrs. Seems more accessor-y than index-y, but yeah maybe not familiar enough to comment beyond how this "feels" to me. |
CRS validation, assuming that the input labels passed to
Yes, although it actually often serves as a rough index itself.
I'll try to explain why this feels index-y to me (sorry in-advance if I'm repeating things for which you are already familiar). Beyond what we can expect from an "index", a custom Xarray Index is an arbitrary object that is closely tied to one or more coordinates and that offers some additional flexibility and benefit:
If a representation (object) is specific to a whole DataArray (Dataset) and is cheap to re-build, then an accessor may be a good solution. If the representation is more specific to a subset of coordinates, then a custom Xarray index for those coordinates seems the right approach IMO. This doesn't prevent accessing that object from an accessor, though. I'd rather see a GeoBox encapsulated in a geo Index and accessed via an accessor than the other way around... Accessors are good for extending the DataArray (Dataset) API but not so good for extending its data model. |
I like this point. I see from the xarray docs that we could have a meta index (or maybe there are other ways too) so we can have access to both x and y coordinates in a single CRS-based index. I could see this being important for having internal polygons or an index that represents the coordinates in a different CRS (or allows for inputs from other CRSes). In your original recent comment from last week you said:
I'm not sure we ever exactly addressed this. Your later comment said:
What various ways of representing the CRS were you thinking? I think rioxarray and geoxarray and other libraries should all (hopefully) be depending on pyproj's |
Sorry that was confusing!
Agreed! Xvec uses pyproj's CRS too. Let me try to illustrate my suggestion with an example. CRSIndex encapsulates another arbitrary index and takes care of checking CRSes during alignment: from typing import Any, Generic, Hashable, TypeVar
import xarray as xr
from pyproj import CRS
from xarray.indexes import Index
T = TypeVar("T", bound=Index)
class CRSIndex(Index, Generic[T]):
def __init__(self, index: T, crs: Any):
self._index = index
self._crs = CRS.from_user_input(crs)
def equal(self, other: Index):
if not isinstance(other, CRSIndex):
return False
if self._crs != other._crs:
return False
return self._index.equals(other._index)
def join(self, other: "CRSIndex[T]", how: str = "inner") -> "CRSIndex[T]":
if self._crs != other._crs:
raise ValueError
index = self._index.join(other._index, how=how)
return type(self)(index)
def reindex_like(
self, other: "CRSIndex[T]", method=None, tolerance=None
) -> dict[Hashable, Any]:
if self._crs != other._crs:
raise ValueError
return self._index.reindex_like(
other._index, method=method, tolerance=tolerance
) We can couple that with an accessor to provide a convenient way to set a CRS index (this is what xvec already does for GeometryIndex): @xr.register_dataarray_accessor("crs")
class CRSAccessor:
def __init__(self, xarray_obj: xr.Dataset | xr.DataArray):
self._obj = xarray_obj
def set_crs(self, coord_name: Hashable, crs: CRS):
crs_index = CRSIndex(self._obj.xindexes[coord_name], crs)
new_coords = xr.Coordinates(
{coord_name: self._obj[coord_name].variable}, {coord_name: crs_index}
)
return self._obj.assign_coords(new_coords) Let's create two dataarrays with an import xvec
da1 = xr.DataArray(
[1, 2],
coords={"geom": [shapely.Point(1, 2), shapely.Point(3, 4)]},
dims="geom",
)
da1 = da1.xvec.set_geom_indexes("geom")
da2 = xr.DataArray(
[3],
coords={"geom": [shapely.Point(1, 2)]},
dims="geom",
)
da2 = da2.xvec.set_geom_indexes("geom")
xr.align(da1, da2)
# (<xarray.DataArray (geom: 1)>
# array([1])
# Coordinates:
# * geom (geom) object POINT (1 2)
# Indexes:
# geom GeometryIndex (crs=None),
# <xarray.DataArray (geom: 1)>
# array([3])
# Coordinates:
# * geom (geom) object POINT (1 2)
# Indexes:
# geom GeometryIndex (crs=None)) Now lets wrap those indexes so they become CRS-aware: da1_crs = da1.crs.set_crs("geom", 26915)
da2_crs = da2.crs.set_crs("geom", 3857)
xr.align(da1_crs, da2_crs)
# ValueError (CRS mismatch) This is just an example, we could have used a RasterIndex, a KDTreeIndex, etc. instead of a GeometryIndex. Now, the CRSIndex above doesn't do many things apart from checking if the CRSes match, so maybe it is not a big deal implementing the same logic for RasterIndex, GeometryIndex, etc.? CRSIndex stills provides a nice way to enable CRS-aware alignment via a uniform API (and maybe other features could be added). Alternatively to explicitly providing the CRS via |
I could see some functionality being dependent on if you're doing something in 2D or 3D in your projection space. I'm not sure that prevents both sets of functionality from living in one CRSIndex object, but it is one of the first things that came to mind.
Wouldn't that only require a |
I am still not convinced about this idea. It gets tricky as soon as you want to do something more complicated. What about re-projecting if setting the CRS happens on the It also feels a bit opaque to see that everything is a CRSIndex rather than specific custom solutions. |
Yeah this idea has still to be more "battle tested" against possible uses cases. I think it is useful to have some way of adding CRS-aware behavior on top of a generic, non-geospatial index. For example, the xoak library will provide (once refactored) indexes that are useful for geospatial and other applications but I think it is outside of the scope of that library to provide any CRS support (and to depend on pyproj). We could create somewhere else ad-hoc wrappers for each case, e.g.,
Fair point. We would need to agree on a slightly larger protocol than just a
In such case, it is probably OK to either raise an error or re-project the coordinate data and then rebuild from scratch the wrapped index?
I don't think that it is a big deal here. Users do (should) not interact directly with xarray Index objects. We can also make it more transparent through the CRSIndex (inline) repr, similarly to how we represent xarray coordinate or data variables wrapping (possibly nested) duck arrays. I'd be more concerned by how to discover and pass options to the wrapped index (e.g., pydata/xarray#8002). |
It'd be nice to experiment some with this prototype CRSIndex (though there are still bugs and work to do)
We could experiment with optionally adding such an index through
rioxarray.open_dataset(..., use_crsindex=True)
and then get some experience with it before refactoring it out to a ecosystem package likegeoxarray
Reprojection seems like a high value application where assigning a new CRSIndex would be quite impactful.
Is there an existing example that might benefit a lot from propagating CRS information automatically?
cc @djhoese
Original Xarray Issue - "Add CRS/projection information to xarray objects"
The text was updated successfully, but these errors were encountered: