Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restored old xarray coord order inference behavior #2579

Merged
merged 1 commit into from
Apr 20, 2018

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Apr 19, 2018

In 1.10.0 the code that infers the coordinate order from an XArray Dataset changed, unintentionally changing the behavior. This PR restores the old behavior.

@jlstevens
Copy link
Contributor

Looks good to me. Happy to merge once the tests are green.

@philippjfr
Copy link
Member Author

The behavior of Dataset.coords seems to have changed in xarray>=0.10.0 and earlier versions do not seem to be predictable. Have to think a bit more about what to do about that. There was a reason that I had made the change originally.

@jlstevens
Copy link
Contributor

Maybe have behavior conditional on the xarray version? Or is that too confusing?

@philippjfr
Copy link
Member Author

I think I was wrong about that, I've responded in the original issue and am going to close this PR for the time being. Unless someone convinces me otherwise I'm going to consider the new behavior more correct and leave it going forward.

@philippjfr
Copy link
Member Author

@marcbernot convinced me this is the correct thing to do. So I would like to see this merged for 1.10.1

@jlstevens
Copy link
Contributor

Tests are green. Merging.

@jlstevens jlstevens merged commit 239c496 into master Apr 20, 2018
@philippjfr philippjfr deleted the xarray_coord_ordering branch July 4, 2018 11:15
@jlstevens
Copy link
Contributor

jlstevens commented Aug 28, 2018

@philippjfr This issue has just bitten me a second time. I know that this is quite an involved topic for me to wade into, but one suggestion by @jsignell is to use the order of .dims on the DataArrays, allowing things like .transpose to work.

This makes more intuitive sense to me so I'm just wondering why we aren't using that approach? Is it just for backwards compatibility, or something else?

@jsignell
Copy link
Member

Here's a visual example:

screen shot 2018-08-28 at 4 42 17 pm

@philippjfr
Copy link
Member Author

philippjfr commented Aug 29, 2018

It's worth reading the linked issue, we had actually very briefly switched to dims ordering (by accident) in 1.10 and reverted the change in this PR. There are two main reasons for that, a) on a xr.Dataset the dims order can vary between the xr.DataArray objects it contains so dims do not provide a global ordering in that case, and b) since xarray is an memory representation of NetCDF data the coordinate order does reflect actual information and is generally not arbitrary. And holoviews as always tried to preserve the semantics declared by the data.

When loading a NetCDF dataset the coordinates are usually ordered something like lon x lat (x height) x time by (COARDS) convention. This happens to be particularly useful since it means that to explore a cube of data like this you can just do:

# holoviews
hv.Dataset(xr_obj).to(hv.Image)

# hvplot
xr_obj.hvplot.image()

and it will automatically deduce that you want a stack of images you can explore across height and/or time.

The problem is that tiffs loaded using xr.open_rasterio sometimes but not always invert the coordinates, which causes these headaches.

That all being said it's not quite clear to me yet what we should do, the xarrays data model does often give coordinates a semantic meaning but if that is not generally followed that becomes less useful. My first instinct would be to try to figure out why rasterio loads some tiffs with a non-convention conforming coordinate order before deciding that holoviews should conform to the dim ordering instead.

@jlstevens
Copy link
Contributor

My first instinct would be to try to figure out why rasterio loads some tiffs with a non-convention conforming coordinate order before deciding that holoviews should conform to the dim ordering instead.

That does seem reasonable.

... the xarrays data model does often give coordinates a semantic meaning but if that is not generally followed that becomes less useful

Agreed. The important thing is that the coordinate ordering is meaningful in the common case and unfortunately loading tiff data with rasterio is one of those common cases.

My only other comment is that if you are visualizing a single xr.DataArray (not a xr.Dataset) then surely there will only be a single dim order. I'm also don't really know what to do or whether it is worth carving out a special case.

@jsignell
Copy link
Member

I think that how rasterio is loads the tiffs is definitely strange and worth looking into. But I think that the discussion of what holoviews should do is slightly different. There is no requirement that xarray objects even have coordinates.

>>> data = np.random.rand(4, 3)
>>> da = xr.DataArray(data)
>>> da
<xarray.DataArray (dim_0: 4, dim_1: 3)>
array([[0.169981, 0.950831, 0.825206],
       [0.242455, 0.056433, 0.346963],
       [0.052712, 0.359673, 0.710705],
       [0.139174, 0.244938, 0.284301]])
Dimensions without coordinates: dim_0, dim_1

>>> da.dims
('dim_0', 'dim_1')

>>> da.coords
Coordinates:
    *empty*

This works:

>>> hv.Image(da.data)

But this throws an error:

>>> hv.Image(da)
....
ValueError: kdims: list length must be between 2 and 2 (inclusive)

I think fundamentally we should be using dims but it is true that there is no global ordering for Datasets with dims. Each data_var in a Dataset is a DataArray which has its own dim ordering.

@jsignell
Copy link
Member

I'll make a PR.

@philippjfr
Copy link
Member Author

We will at least have to come up with a slightly more nuanced policy than simply always using the dims, perhaps only when passing in DataArrays? When we accidentally switched to using dim ordering in 1.10, multiple users complained that it was breaking their workflows.

@jbednar
Copy link
Member

jbednar commented Aug 30, 2018

I would have thought that the current situation is that people have to end up specifying their kdims explicitly for it to behave in any reasonable way, and if they've done that then this change won't affect them. Is that true?

@jbednar
Copy link
Member

jbednar commented Aug 30, 2018

In any case, maybe you could poll those users about this change?

@jlstevens
Copy link
Contributor

We will at least have to come up with a slightly more nuanced policy than simply always using the dims, perhaps only when passing in DataArrays?

@philippjfr That is what I was suggesting above as one possible option. Of course, I would rather avoid special cases so if it is appropriate to always use dims, we should do that.

@jsignell I didn't know that DataArray's didn't have coords, thanks for pointing that out. If dims must be there (but coords don't) then it would seem appropriate for me for holoviews to use dims in order to support the more general case. A PR would certainly help us think more clearly about this issue!

I would have thought that the current situation is that people have to end up specifying their kdims explicitly for it to behave in any reasonable way, and if they've done that then this change won't affect them.

That is also what I was thinking/hoping!

@philippjfr
Copy link
Member Author

I would have thought that the current situation is that people have to end up specifying their kdims explicitly for it to behave in any reasonable way, and if they've done that then this change won't affect them. Is that true?

No, it's worth reading my summary above. In most cases coords are in the appropriate order and at least for multi-dimensional datasets are often more informative than the dims since they have semantic meaning and follow certain conventions while dims simply describe the orientation of the array in memory.

I would have thought that the current situation is that people have to end up specifying their kdims explicitly for it to behave in any reasonable way

As I mention above the problem arises because for certain tiffs the xarray rasterio returns coordinates not following the convention which means you have to specify the key dimensions to orient the array correctly. In almost all other cases I have come across coordinates behave correctly and for multi-dimensional datasets in particular they result in preferable behavior which is why users complained when we switched to dims ordering.

@philippjfr
Copy link
Member Author

As for objects without dims I suspect we might currently have much more general issues handling those, but at least in the DataArray case we definitely should fix that. I'm not entirely sure what it would mean for there to be an xr.Dataset without coords or if that's even possible since the coordinates specify how the DataArrays contained in the Dataset line up.

@jlstevens
Copy link
Contributor

I guess my exposure is only to the tiff case which doesn't work well: I'll take your word that the default of using coords works well in most other cases.

That said (and I may be misremembering!) doesn't the geoviews Image.load_image helper classmethod now have geotiff support? And does this (or can it) compensate for the surprising way rasterio loads things?

@philippjfr
Copy link
Member Author

philippjfr commented Aug 30, 2018

In any case, maybe you could poll those users about this change?

The response was pretty emphatic at the time, it's also a major backward incompatible change. I really don't think we can just switch the behavior entirely, I'd be much happier with a more narrow and targeted fix.

That said (and I may be misremembering!) doesn't the geoviews Image.load_image helper classmethod now have geotiff support?

I implemented it as a separate method called gv.Image.load_tiff since the signature was quite different. I'd have to look into whether it's possible to correct the issue in there. Could you send me a couple of tiffs by email to test on?

@jsignell
Copy link
Member

Having looked at the xarray piece of holoviews, I think there is an easy fix for the no coords case. We can assign coordinates based on dim names when no coords are specified:

data.assign_coords(**{k: range(v) for k, v in data.dims.items()})

this will add an int index for each dim same as what happens if you roundtrip an xarray object to pandas and back:

>>> data.to_dataframe().to_xarray()
<xarray.Dataset>
Dimensions:  (dim_0: 7, dim_1: 9)
Coordinates:
  * dim_0    (dim_0) int64 0 1 2 3 4 5 6
  * dim_1    (dim_1) int64 0 1 2 3 4 5 6 7 8
Data variables:
    z        (dim_0, dim_1) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ...

@philippjfr
Copy link
Member Author

Great, thanks @jsignell, that makes sense to me, just make sure to make a shallow copy before assigning the coords (if assign_coords doesn't already do that).

@jsignell
Copy link
Member

As for ordering of kdims based on dims, I don't think there is a good global way because the ordering of dims just isn't global.

@jsignell
Copy link
Member

just make sure to make a shallow copy before assigning the coords

Right the original object isn't mutated

@jlstevens
Copy link
Contributor

Could you send me a couple of tiffs by email to test on?

Done for the tiff shown in the screenshot above...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants