-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement dropping dimensions in gridded groupby #1154
Conversation
Sounds like the right thing to do, to me. |
This makes sense for conversion to elements that work best with columnar data anyway - no need to throw away data unless really necessary (and in most cases, it shouldn't be necessary). My main question is about the potential performance implications of this approach? |
The title of this PR should be updated as I think it is misleading - if I understand correctly, this PR is preserving data instead of dropping it! |
Unless your data is huge it's pretty fast, it only expands the dimensions that are left so there's no huge overhead there and even then you can use the dynamic groupby. In the end it's definitely better than applying the groupby and then throwing an exception because the Elements can't interpret the data. |
Ok, I now see that these operations weren't working at all before (raising exceptions). Note that it is good to preserve columns, even if they aren't used in the rendering/display. We might want to consider a utility that traverses a nested structure to strip out unused columns (e.g if the user decides they are happy to throw away data to save space). Might be worth filing as a feature request... |
I think something like that might be a good idea, we should probably also look into how reindex behaves on the different interfaces. In some cases it explicitly drops columns (e.g. array) when in other cases it simply ignores the dimension, being more consistent about this behavior or providing better control over it might be enough to address this problem. |
Ok, looks good! Merging. |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
When using the
.to
method to convert a gridded dataset into lower dimensional views it is possible to drop dimensions, e.g. to visualize the distribution of values in some space. Think for example of a 3D cube of temperatures (lat * lon * altitude) . I want to view the distribution of values for each altitude, dropping the lat and lon dimensions. Previously we supported this in some limited cases but there's a simple solution to the problem, when applying a groupby on a gridded dataset which drops certain dimensions we simply convert to a columnar format. This will only affect the.to
method which is meant specifically to convert high-dimensional data into a lower dimensional view so I think it's consistent with the overall semantics.