Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

calculating cumsums on a groupby object #3141

Closed
tommylees112 opened this issue Jul 18, 2019 · 6 comments · Fixed by #6525
Closed

calculating cumsums on a groupby object #3141

tommylees112 opened this issue Jul 18, 2019 · 6 comments · Fixed by #6525

Comments

@tommylees112
Copy link

tommylees112 commented Jul 18, 2019

How do I go about calculating cumsums on a groupby object?

I have a Dataset that looks as the following:

lat = np.linspace(-5.175003, 5.9749985, 224)
lon = np.linspace(33.524994, 42.274994, 176)
time = pd.date_range(start='1981-01-31', end='2019-04-30', freq='M')
data = np.random.randn(len(time), len(lat), len(lon))
dims = ['time', 'lat', 'lon']
coords = {'time': time, 'lat': lat, 'lon': lon}

ds = xr.Dataset({'precip': (dims, data)}, coords=coords)

Out[]:
<xarray.Dataset>
Dimensions:  (lat: 224, lon: 176, time: 460)
Coordinates:
  * time     (time) datetime64[ns] 1981-01-31 1981-02-28 ... 2019-04-30
  * lat      (lat) float64 -5.175 -5.125 -5.075 -5.025 ... 5.875 5.925 5.975
  * lon      (lon) float64 33.52 33.57 33.62 33.67 ... 42.12 42.17 42.22 42.27
Data variables:
    precip   (time, lat, lon) float64 0.006328 0.2969 1.564 ... 0.6675 2.32

I need to groupby year and calculate the cumsum for each year. That way I will have a value for each month (timestep) and each pixel (lat - lon pair).

But the cumsum operation doesn't work on a groupby object

ds.groupby('time.year').cumsum(dim='time')

Out[]:
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-dceee5f5647c> in <module>
      9 display(ds_)
     10 
---> 11 ds_.groupby('time.year').cumsum(dim='time')

AttributeError: 'DatasetGroupBy' object has no attribute 'cumsum'

Is there a work around?

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.0 | packaged by conda-forge | (default, Nov 12 2018, 12:34:36) 
[Clang 4.0.1 (tags/RELEASE_401/final)]
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2

xarray: 0.12.2
pandas: 0.24.2
numpy: 1.16.4
scipy: 1.3.0
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudonetCDF: None
rasterio: 1.0.17
cfgrib: 0.9.7
iris: None
bottleneck: 1.2.1
dask: 1.2.2
distributed: 1.28.1
matplotlib: 3.1.0
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.0.1
pip: 19.1
conda: None
pytest: 4.5.0
IPython: 7.1.1
sphinx: 2.0.1
@dcherian
Copy link
Contributor

I wonder if this is as easy as adding ops.inject_cum_methods(Dataset.groupby) at the end of core/groupby.py?

@shoyer
Copy link
Member

shoyer commented Jul 18, 2019

It looks like ds.groupby('time.year').apply(lambda x: x.cumsum(dim='time')) mostly works for now.

But yes, it would be great to add this.

@dcherian
Copy link
Contributor

@tommylees112 Are you up for sending in a PR. It's an easy fix...

@tommylees112
Copy link
Author

Would love to! Sorry have been away this weekend. Do i just clone the repo write the code and send in a PR in a new branch?

(first PR on a public repo!)

@nbren12
Copy link
Contributor

nbren12 commented Jul 22, 2019

Xarray has a pretty extensive contributor's guide that you might find helpful. In short, the way to contribute changes is to create your own fork of xarray, commit/push some changes, and finally submit a pull request (PR).

@dcherian dcherian pinned this issue Sep 7, 2019
VladSkripniuk added a commit to VladSkripniuk/xarray that referenced this issue Oct 19, 2019
@max-sixty max-sixty unpinned this issue Apr 21, 2021
@stale
Copy link

stale bot commented Jun 23, 2021

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically

@stale stale bot added the stale label Jun 23, 2021
@mathause mathause removed the stale label Jun 24, 2021
dcherian pushed a commit to dcherian/xarray that referenced this issue Apr 27, 2022
dcherian added a commit that referenced this issue Jul 20, 2022
* Add cumsum to DatasetGroupBy

Fixes #3141

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

* More fix.

* Add whats-new

* [skip-ci] Add to api.rst

* Update xarray/tests/test_groupby.py

Co-authored-by: Illviljan <[email protected]>

* Update xarray/core/groupby.py

* Update xarray/core/groupby.py

* Update xarray/tests/test_groupby.py

Co-authored-by: Vlad Skripniuk <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Maximilian Roos <[email protected]>
Co-authored-by: Illviljan <[email protected]>
Co-authored-by: Anderson Banihirwe <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants