-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to handle calculations for variables not defined in longitude #194
Comments
Before even getting to the regional calculations, this non-zonally-defined data requires another fix: performing the zonal average within the Var's function causes sfc_area and land_mask get dropped. A quick-fix that worked for me was to add the following within # Re-introduce any coords that got dropped.
if LON_STR not in full_ts:
for coord in (SFC_AREA_STR, LAND_MASK_STR):
if coord not in full_ts:
coord_arr = data[0][coord]
full_ts = full_ts.assign_coords(
**{coord: coord_arr.mean(dim=LON_STR)}
) Essentially, this checks if the longitude dimension exists, and if not (assuming that sfc_area and land_mask have thus been dropped but did exist in the original data), it averages surface area and the land mask in longitude and re-appends them. I think we could do better here though than this quick fix. So let's think through more carefully about how best to approach it and how to make it more general. One thing that immediately comes to mind is cases like EBMs or axisymmetric GCMs (both of which I'm working with right now outside of aospy) where the input data is not defined in longitude either, so this quick-fix would fail. |
One idea: use the |
@spencerahill sorry for taking so long to get back to you on this. To date I've done all my analysis on zonal mean data outside of aospy (also after the time mean). It requires a lot of comparison between simulations [for which aospy, as it's designed currently is also not directly useful for (it's great for generating the data for doing this analysis though)]. These are my quick thoughts on the discussion you started above. Region objects: For determining which Region objects would be applicable for particular Var objects, I think your idea to use the existing Coordinate averaging: It feels a little messy to try to predict exactly what the user would want to do with grid attributes under the hood (e.g. automatically re-appending average versions of them over dropped dimensions). In particular, how would we know whether the dimension was reduced out (causing coordinates to be dropped) or whether those dimensions never existed (as in longitude for the EBMs and axisymmetric GCMs)? Perhaps we could do something more explicit (i.e. give users the tools to solve this problem themselves within their own object libraries)? My ideas aren't particularly well-formed on this, but perhaps an alternative solution could be to use xarray's accessor capability to create some helper functions that act on DataArrays for this purpose? For instance one could define a import xarray as xr
@xr.register_dataarray_accessor('aospy')
class AospyAccessor(object):
def __init__(self, xarray_obj):
self._obj = xarray_obj
def coord_preserving_mean(self, *args, **kwargs):
name = self._obj.name
ds = self._obj.reset_coords()
names = set(self._obj.coords) - set(self._obj.dims)
ds = ds.mean(*args, **kwargs)
return ds.set_coords(names)[name] For example, this would work like: In [1]: import numpy as np
In [2]: import xarray as xr
In [3]: a = xr.DataArray(np.ones((5, 6)), coords=[np.arange(5), np.arange(6)],
dims=['a', 'b'], name='test')
In [4]: a['test_coord'] = xr.DataArray(np.ones((5, 6)), coords=a.coords)
In [5]: a.aospy.coord_preserving_mean('b')
Out [5]:
<xarray.DataArray 'test' (a: 5)>
array([ 1., 1., 1., 1., 1.])
Coordinates:
* a (a) int64 0 1 2 3 4
test_coord (a) float64 1.0 1.0 1.0 1.0 1.0
In [6]: a.mean('b')
Out [6]:
<xarray.DataArray 'test' (a: 5)>
array([ 1., 1., 1., 1., 1.])
Coordinates:
* a (a) int64 0 1 2 3 4 My gut instinct is to push against adding more logic to the main Calc pipeline, but I could also see how this alternative solution might be pushing additional complexity onto the users. What are your thoughts? |
@spencerkclark thanks a lot for this...apologies; I also might take a while to respond substantively |
Side issue, but what specifically do you mean here? Room for improvement on the aospy side?
I agree. Very cool example of a custom accessor. I must admit I didn't even really understand them until this!
This concerns me too. Let's see what emerges from pydata/xarray#1497 and then we can re-visit this |
Possibly yes -- I have some code that I use to systematically load data produced by E.g. something like:
If you have some parameters in simulations that you tweak in a systematic way (my current use-case), it can alternatively be even more helpful to have those as coordinates (rather than run names). E.g. something like:
Then if I wanted to look at differences in The code I use now to construct these essentially just recreates the expected file names of |
@spencerkclark my experiment tool does exactly what you describe in that post. It constructs everything using dask arrays so that calculations are deferred. Earlier in the year I was playing around with ways to extend the NetCDF data model to handle the case where one of the "param" or "run" dimensions was a dataset with different lat/lon dimensions (e.g. a different model participating in CMIP5), and I've scoped out solutions which lean on hierarchical Groups inside a NetCDF file to do so, but it's pretty rough-shod. |
@darothen indeed your experiment tool caught my eye a few weeks ago and I've been meaning to try it out. Maybe later this week! |
Thanks @spencerkclark for fleshing that out, and @darothen for sharing
Now I understand...in fact I also have hacked together similar functionality for use on one of my projects where my simulations are a sweep over two independent parameters (although it's entirely outside of aospy). Clearly it's a common use case :)
Is this related to pydata/xarray#1482 ? I'd be interested to hear more about this...CMIP-style comparisons are obviously a huge use case and what partly motivated aospy in the first place. aospy shines in looping over models with different grids to repeat calculation across them, but it doesn't have a data structure like you're describing (or even data loading methods) that makes subsequent interaction with those results especially easy. Do you have an issue in your repo open on this topic where we could discuss further? On the one hand I could see a place for something like this within aospy. On the other hand, I like the idea of it as a more general purpose tool, which seems to be what you've started in on. Also, TBH I am struggling to find the time lately even to fix our existing bugs, let alone add new features! Last thought: such a tool seems like a great candidate for an official pangeo-data project... In terms of the actual subject of this Issue: I still need to think about it! 😛 |
I'm also very interested in this, and would like to continue the discussion in the appropriate place. @spencerahill see also: pydata/xarray#1092 |
Sorry to derail your thread! Started a new issue over at darothen/experiment#18 if you want to brainstorm on the Groups idea. |
Excellent! Thanks Dan. |
Flagging this issue -- I would also be interested in things like zonal means |
I am computing the meridional mass streamfunction for a simulation. This quantity takes the (lon, lat, vert, time) meridional wind and pressure and outputs a (lat, vert, time) streamfunction. Essentially all of the logic in
region.py
assumes data that is defined in both latitude and longitude, and so the calculations are crashing. (The first place is within_add_to_mask
, but it's throughout the module andRegion
class).So we need to think through this. It seems to me that regional averages could only be meaningfully computed for those regions that span all longitudes, e.g. the Tropics (30S-30N) or Northern Hemisphere (0-90N), but not e.g. the Sahel (10-20N, 20W-40E). So we need to compare the dimensions of the data with those of the Region.
This is the first time I've come across this use-case (at least in aospy's modern incarnation).
@spencerkclark, I know you've analyzed zonal mean circulations some over the past year or so. Have you come across this at all?
The text was updated successfully, but these errors were encountered: