-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New infer_intervals keyword for pcolormesh #1079
Conversation
I would call this just |
Is (Sorry if I am only making noise here and this does not make sense. In that case just ignore my comment.) |
Yes, it is a special case for pcolormesh though
Currently the entire xarray workflow is built upon the implicit assumption that the coordinates are always at the grid-point center ( @shoyer correct me if I'm wrong). I'm not sure if the additional complexity added by cell boundaries is on the xarray devs priority list... |
A special case that I love. In the old Matlab day we had horrible hacks to get pcolor to work properly 😄
That is fine. Maybe a warning in the docs would be nice. So people can revert to a "manual" plotting to get the boundaries right. |
pcolormesh and imshow are indeed very consistent in xarray: import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import xarray as xr
da = xr.DataArray(np.arange(20).reshape(4, 5), dims=['lat', 'lon'],
coords = {'lat': np.linspace(0, 30, 4),
'lon': np.linspace(-20, 20, 5)})
f = plt.figure(figsize=(5, 8))
ax = plt.subplot(3, 1, 1, projection=ccrs.PlateCarree())
da.plot.imshow(ax=ax, transform=ccrs.PlateCarree())
lon, lat = np.meshgrid(da.lon.values, da.lat.values)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('imshow')
ax = plt.subplot(3, 1, 2, projection=ccrs.PlateCarree())
da.plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree())
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('pcolormesh')
ax = plt.subplot(3, 1, 3, projection=ccrs.PlateCarree())
da.plot.pcolormesh(ax=ax, transform=ccrs.PlateCarree(), infer_intervals=False)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('pcolormesh - no intervals') In most cases setting |
We definitely ignore cell boundaries -- they don't (yet) have any place in the xarray data model. It would be nice to have a built in "interval" data type, along with an |
Even though I would love to have that functionality I do not believe it is high priority. Maybe you could only add a note in the docs mentioning that the cell boundaries might exist? |
""" | ||
Pseudocolor plot of 2d DataArray | ||
|
||
Wraps matplotlib.pyplot.pcolormesh | ||
""" | ||
|
||
if not hasattr(ax, 'projection'): | ||
if infer_intervals: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this change. I'm not sure I like infer_intervals
to default to True
. This is going to break a bunch of our plotting code. My preference, and I'm open to other ideas at this point, would be to default to infer_intervals=None
with something like this logic:
if (infer_intervals is None) and (hasattr(ax, 'projection'))
infer_intervals = False
warnings.warn("infer_intervals=None is deprecated, please specify either True or False",
DeprecationWarning)
if infer_intervals:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure to understand... infer_intervals=True
was the de-facto default until V0.7.2, and from the example above it is clear that setting it to False
is almost never what you want to do. Can you provide an example where this is a good thing?
I agree your concerns about breaking old code though. How about keeping the current behavior for one more version, but issuing a warning that in the next version (probably 0.9.1 or 0.10), it is going to be set to True per default?
For my purposes this is going to imply that I'm going to use imshow()
and not pcolormesh
anymore for the time being.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting infer_intervals=False
is exactly what you want if you have irregularly spaced coordinates (see example here: #781 (comment)).
I think something like the logic I showed above may be a good compromise. I'm happy to transition to always specifying infer_intervals=False
we should be more transparent about the change (hence the warning along with a None for a default value. Would you be happier with this logic?
if (infer_intervals is None):
if (hasattr(ax, 'projection'))
infer_intervals = False
warnings.warn("infer_intervals=None is deprecated, please specify either True or False",
DeprecationWarning)
else:
infer_intervals = True
if infer_intervals:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, with irregular coordinates none of the solutions is satisfying. infer_intervals=True
crops one column, while infer_intervals=False
crop one column and one line of data:
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import xarray as xr
lon, lat = np.linspace(-20, 20, 5), np.linspace(0, 30, 4)
lon, lat = np.meshgrid(lon, lat)
da = xr.DataArray(np.arange(20).reshape(4, 5), dims=['y', 'x'],
coords = {'lat': (('y', 'x'), lat),
'lon': (('y', 'x'), lon)})
f = plt.figure()
ax = plt.subplot(2, 1, 1, projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, transform=ccrs.PlateCarree(), infer_intervals=True)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('pcolormesh')
ax = plt.subplot(2, 1, 2, projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, transform=ccrs.PlateCarree(), infer_intervals=False)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('pcolormesh - no intervals')
plt.show()
I'll try to have a closer look later next week
After giving this a few more thoughts:
So what do we do with this? @jhamman 's suggestion is to set import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import xarray as xr
lon, lat = np.linspace(-20, 20, 5), np.linspace(0, 30, 4)
lon, lat = np.meshgrid(lon, lat)
lon += lat/10
lat += lon/10
da = xr.DataArray(np.arange(20).reshape(4, 5), dims=['y', 'x'],
coords = {'lat': (('y', 'x'), lat),
'lon': (('y', 'x'), lon)})
ax = plt.subplot(1, 2, 1, projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, transform=ccrs.PlateCarree(), infer_intervals=False)
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('2d irreg case - pcolor')
ax = plt.subplot(1, 2, 2, projection=ccrs.PlateCarree())
da.plot.contourf('lon', 'lat', ax=ax, transform=ccrs.PlateCarree())
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.set_title('2d irreg case - contourf') Note that one line and one column of data disappeared (so that the colorbar range doesn't correspond to what we see) and the coordinates are not understood as being in the cell center. The advantage of this choice are twofold:
Let's call this option "Option A". [edited] I've been playing around with an "Option B", which includes making [\edited] This plot is closer to what I think the grid actually represents, but I am biased towards my usage of xarray (gridded geoscientific models). The disadvantages of option B are:
I currently tend to go for option A, since option B is a bit awkward and users who have bounded coordinated available somewhere (as suggested by @ocefpaf ) will have to implement their own plotting routine anyway. |
I've come up with a cleaner implementation which doesn't require scipy and produces a very accurate plot: I'm going to clean all this later today and update my PR, I think that we can come to something nice. @jhamman : I wonder if the problem you are having in your example is related to flaoting point issues at the pole. Would you mind sending me the file and the code that produces the plot so that I can have a look at it? |
You can get data on the same grid (different variable, same coords) from the sample data |
@jhamman @shoyer this is ready for another review (the test failure seems unrelated). @ocefpaf I added a note to the docs @jhamman OK now I get why it clearly doesn't make sense to infer anything in your dataset (plot below). Just out of curiosity: does RASM have a map projection on which it has regular coordinates, too? |
Actually, you wouldn't like to infer intervals on these coordinates at all, regardless if you are on a map or not. Should I remove the |
Finally I think it's ok to keep it as it is now, since the data above is likely to always be plotted on a map anyways. |
This looks like a really nice PR. There is some intersection with my "multidimensional coordinates" tutorial: At the least, we should standardize the terminology in the docs. Are these "irregular coordinates" or "multidimensional coordinates"? I don't particularly care, but we should be consistent. |
@rabernat thanks for the link! I'll update the docs with a link to your example and with "multidimensional" instead of "irregular". |
@rabernat : couldn't your tutorial also use the ipython directive and run at build time instead of using the parsed literals? I see that the other examples are also static, is there a reason for this? |
My laziness / ignorance of the documentation build process. |
OK. Let's see what the gurus say about this, maybe there is a reason to keep it like this |
Yes, could certainly run this at doc build time instead of using a notebook. The advantage of the notebook is that it's easier to edit/assemble, and also easier to download and adjust. In an ideal world, I think we would check in a notebook with cleared output and then run it and convert to HTML at doc build time. Looks like that's exactly what nbsphinx does if anyone wants to take a go at setting that up. Could be a much nicer way to write the docs than RST. |
Note that in that case, multidimensional-coords and monthly-means are already formatted in rst. |
Right, we used a static ipynb -> rst converter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this fix. I know my suggestions lead to a bunch of complicated logic but I think this is the best solution for now. Thanks for putting in the hard work to get here.
last = coord[-1] + deltas[-1] | ||
return np.r_[[first], coord[:-1] + deltas, [last]] | ||
if axis == 0: | ||
deltas = 0.5 * (coord[1:, ...] - coord[:-1, ...]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could write an axis agnostic version of this with np.diff
, np.take
and np.concatenate
, e.g.,
deltas = 0.5 * np.diff(coord, axis)
first = np.take(coord, [0], axis)
last = np.take(coord, [-1], axis)
trim_last = tuple(slice(None, -1) if n == axis else slice(None)
for n in range(coord.ndim))
return np.concatenate([first, coord[trim_last] + deltas, last], axis)
Maybe a slight improvement? (Up to you on whether to use it)
first = coord[0] - deltas[0] | ||
last = coord[-1] + deltas[-1] | ||
return np.r_[[first], coord[:-1] + deltas, [last]] | ||
if axis == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an assert
or error to verify that axis
is one of the expected values.
else: | ||
# we have to infer the intervals on both axes | ||
x = _infer_interval_breaks(x, axis=1) | ||
x = _infer_interval_breaks(x, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clever!
I had to modify your code a little bit for this to work, but this is now accepting any axis. Quite a useful function actually! |
last = np.take(coord, [-1], axis=axis) + np.take(deltas, [-1], axis=axis) | ||
trim_last = tuple(slice(None, -1) if n == axis else slice(None) | ||
for n in range(coord.ndim)) | ||
return np.concatenate([first, coord[trim_last] + deltas, last], axis=axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nifty! Loved the use of axis
here!
@@ -112,6 +112,17 @@ def test__infer_interval_breaks(self): | |||
self.assertArrayEqual(pd.date_range('20000101', periods=4) - np.timedelta64(12, 'h'), | |||
_infer_interval_breaks(pd.date_range('20000101', periods=3))) | |||
|
|||
# make a bounded 2D array that we will center and re-infer | |||
xref, yref = np.meshgrid(np.arange(6), np.arange(5)) | |||
cx = (xref[1:, 1:] + xref[:-1, :-1]) / 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Python 2, I think this will still have integer dtype. Can you divide by 2.0
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a from __future__ import division
instead
@@ -1,3 +1,5 @@ | |||
from __future__ import division |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this to the top of literally every single Python file in xarray?
That will keep this easier to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:D sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes OK. shouldn't I do this in a separate PR? I'll do it tomorrow though, it's getting late in Innsbruck...
Yes, thanks! On Thu, Nov 10, 2016 at 2:43 PM, Fabien Maussion [email protected]
|
Addresses #781
@jhamman what do you think? I'm not sure if
infer_interval_breaks
is the best name for the keyword but I didn't come up with anything better than that. (maybeinfer_coord_intervals
?)