-
-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using categorical coloring for separate aggregates #513
Comments
This issue is basically about working with wide rather than tall or tidy data. One suggestion I'd have is accepting lists for the Glyph x and y column references. Then you could express your dropoff/pickup example as: canvas.points(df, ['pickup_x', 'dropoff_x'], ['dropoff_x', 'dropoff_y']) and express the kind of problem described in #512 as: canvas.line(df, 'x', ['col1', 'col2', 'col3', ...]) This would make it quite easy to work with wide data with a lot of columns you want to aggregate on. I'm sure internally this could also be made efficient. |
Right, the example above was about wide data, and I agree that your proposed syntax would make such an example more convenient to do. In general, though, we should be able to use weighted-color-average plots for any suitable data, including separate aggregate arrays from arbitrary sources (e.g. different dataframes altogether) or from the same source but with arbitrary custom xarray operations on each one before rendering. So we'd still need a standalone example or utility showing how to combine the separate aggregates into the data structure expected by shade(). |
Not that such an example should be difficult; I expect it's a one-liner; I just don't have time today to create it, hence the issue. :-) |
If we have a dataframe with points in it that each have a category assigned in some other column, we can generate a single image from it where each pixel's color is an average of the category colors, weighted by the counts for each category:
However, weighted-color-average plots are also useful in cases where no category field is available. Right now, if you wanted to use the category coloring to show NYC taxi pickups vs. dropoffs, you could create a new data frame twice as long as the old one, with each row representing a pickup or dropoff only (instead of a pickup,dropoff pair as it is now), and synthesize a new column indicating whether each point was a pickup or a dropoff. It seems helpful if we provide at least an example, if not a utility, of how to avoid having to doctor the original dataset in this way, because we should be able to make the same calculation simply from the separate aggregates by packing them into the appropriate xarray data structure expected by shade() when given categorical data.
The text was updated successfully, but these errors were encountered: