Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dropping dimensions in gridded groupby #1154

Merged
merged 6 commits into from
Feb 27, 2017
Merged

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Feb 26, 2017

When using the .to method to convert a gridded dataset into lower dimensional views it is possible to drop dimensions, e.g. to visualize the distribution of values in some space. Think for example of a 3D cube of temperatures (lat * lon * altitude) . I want to view the distribution of values for each altitude, dropping the lat and lon dimensions. Previously we supported this in some limited cases but there's a simple solution to the problem, when applying a groupby on a gridded dataset which drops certain dimensions we simply convert to a columnar format. This will only affect the .to method which is meant specifically to convert high-dimensional data into a lower dimensional view so I think it's consistent with the overall semantics.

@philippjfr philippjfr added the type: feature A major new feature label Feb 26, 2017
@jbednar
Copy link
Member

jbednar commented Feb 26, 2017

Sounds like the right thing to do, to me.

@jlstevens
Copy link
Contributor

This makes sense for conversion to elements that work best with columnar data anyway - no need to throw away data unless really necessary (and in most cases, it shouldn't be necessary).

My main question is about the potential performance implications of this approach?

@jlstevens
Copy link
Contributor

The title of this PR should be updated as I think it is misleading - if I understand correctly, this PR is preserving data instead of dropping it!

@philippjfr
Copy link
Member Author

My main question is about the potential performance implications of this approach?

Unless your data is huge it's pretty fast, it only expands the dimensions that are left so there's no huge overhead there and even then you can use the dynamic groupby. In the end it's definitely better than applying the groupby and then throwing an exception because the Elements can't interpret the data.

@jlstevens
Copy link
Contributor

Ok, I now see that these operations weren't working at all before (raising exceptions).

Note that it is good to preserve columns, even if they aren't used in the rendering/display. We might want to consider a utility that traverses a nested structure to strip out unused columns (e.g if the user decides they are happy to throw away data to save space). Might be worth filing as a feature request...

@philippjfr
Copy link
Member Author

Note that it is good to preserve columns, even if they aren't used in the rendering/display. We might want to consider a utility that traverses a nested structure to strip out unused columns (e.g if the user decides they are happy to throw away data to save space).

I think something like that might be a good idea, we should probably also look into how reindex behaves on the different interfaces. In some cases it explicitly drops columns (e.g. array) when in other cases it simply ignores the dimension, being more consistent about this behavior or providing better control over it might be enough to address this problem.

@jlstevens
Copy link
Contributor

Ok, looks good! Merging.

@jlstevens jlstevens merged commit 336ccf6 into master Feb 27, 2017
@philippjfr philippjfr deleted the groupby_fixes branch February 27, 2017 23:29
@philippjfr philippjfr added this to the v1.7.0 milestone Feb 28, 2017
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
type: feature A major new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants