Implement dropping dimensions in gridded groupby #1154

philippjfr · 2017-02-26T13:57:20Z

When using the .to method to convert a gridded dataset into lower dimensional views it is possible to drop dimensions, e.g. to visualize the distribution of values in some space. Think for example of a 3D cube of temperatures (lat * lon * altitude) . I want to view the distribution of values for each altitude, dropping the lat and lon dimensions. Previously we supported this in some limited cases but there's a simple solution to the problem, when applying a groupby on a gridded dataset which drops certain dimensions we simply convert to a columnar format. This will only affect the .to method which is meant specifically to convert high-dimensional data into a lower dimensional view so I think it's consistent with the overall semantics.

jbednar · 2017-02-26T14:50:37Z

Sounds like the right thing to do, to me.

jlstevens · 2017-02-27T12:48:47Z

This makes sense for conversion to elements that work best with columnar data anyway - no need to throw away data unless really necessary (and in most cases, it shouldn't be necessary).

My main question is about the potential performance implications of this approach?

jlstevens · 2017-02-27T12:50:46Z

The title of this PR should be updated as I think it is misleading - if I understand correctly, this PR is preserving data instead of dropping it!

philippjfr · 2017-02-27T12:51:23Z

My main question is about the potential performance implications of this approach?

Unless your data is huge it's pretty fast, it only expands the dimensions that are left so there's no huge overhead there and even then you can use the dynamic groupby. In the end it's definitely better than applying the groupby and then throwing an exception because the Elements can't interpret the data.

jlstevens · 2017-02-27T13:00:33Z

Ok, I now see that these operations weren't working at all before (raising exceptions).

Note that it is good to preserve columns, even if they aren't used in the rendering/display. We might want to consider a utility that traverses a nested structure to strip out unused columns (e.g if the user decides they are happy to throw away data to save space). Might be worth filing as a feature request...

philippjfr · 2017-02-27T13:35:43Z

Note that it is good to preserve columns, even if they aren't used in the rendering/display. We might want to consider a utility that traverses a nested structure to strip out unused columns (e.g if the user decides they are happy to throw away data to save space).

I think something like that might be a good idea, we should probably also look into how reindex behaves on the different interfaces. In some cases it explicitly drops columns (e.g. array) when in other cases it simply ignores the dimension, being more consistent about this behavior or providing better control over it might be enough to address this problem.

jlstevens · 2017-02-27T13:51:40Z

Ok, looks good! Merging.

github-actions · 2024-10-26T01:45:23Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

philippjfr added 5 commits February 26, 2017 12:18

Pop mdims in DataConversion avoiding second warning

d28aa26

Small fix for DataConversions to distribution types

b86eb7f

DataConversion filters kdims when supplying groupby

6a28b88

Do not reindex gridded Datasets during groupby

faf95d3

Implement partial dimension gridded groupby

1ad63b4

philippjfr added the type: feature A major new feature label Feb 26, 2017

Implemented partial dimension groupby in dynamic mode

203a09b

philippjfr requested a review from jlstevens February 26, 2017 16:30

jlstevens merged commit 336ccf6 into master Feb 27, 2017

philippjfr deleted the groupby_fixes branch February 27, 2017 23:29

philippjfr added this to the v1.7.0 milestone Feb 28, 2017

github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement dropping dimensions in gridded groupby #1154

Implement dropping dimensions in gridded groupby #1154

philippjfr commented Feb 26, 2017 •

edited

Loading

jbednar commented Feb 26, 2017

jlstevens commented Feb 27, 2017

jlstevens commented Feb 27, 2017

philippjfr commented Feb 27, 2017

jlstevens commented Feb 27, 2017

philippjfr commented Feb 27, 2017

jlstevens commented Feb 27, 2017

github-actions bot commented Oct 26, 2024

Implement dropping dimensions in gridded groupby #1154

Implement dropping dimensions in gridded groupby #1154

Conversation

philippjfr commented Feb 26, 2017 • edited Loading

jbednar commented Feb 26, 2017

jlstevens commented Feb 27, 2017

jlstevens commented Feb 27, 2017

philippjfr commented Feb 27, 2017

jlstevens commented Feb 27, 2017

philippjfr commented Feb 27, 2017

jlstevens commented Feb 27, 2017

github-actions bot commented Oct 26, 2024

philippjfr commented Feb 26, 2017 •

edited

Loading