Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using categorical coloring for separate aggregates #513

Open
jbednar opened this issue Oct 26, 2017 · 3 comments
Open

Using categorical coloring for separate aggregates #513

jbednar opened this issue Oct 26, 2017 · 3 comments

Comments

@jbednar
Copy link
Member

jbednar commented Oct 26, 2017

If we have a dataframe with points in it that each have a category assigned in some other column, we can generate a single image from it where each pixel's color is an average of the category colors, weighted by the counts for each category:

image

However, weighted-color-average plots are also useful in cases where no category field is available. Right now, if you wanted to use the category coloring to show NYC taxi pickups vs. dropoffs, you could create a new data frame twice as long as the old one, with each row representing a pickup or dropoff only (instead of a pickup,dropoff pair as it is now), and synthesize a new column indicating whether each point was a pickup or a dropoff. It seems helpful if we provide at least an example, if not a utility, of how to avoid having to doctor the original dataset in this way, because we should be able to make the same calculation simply from the separate aggregates by packing them into the appropriate xarray data structure expected by shade() when given categorical data.

@philippjfr
Copy link
Member

philippjfr commented Oct 26, 2017

This issue is basically about working with wide rather than tall or tidy data. One suggestion I'd have is accepting lists for the Glyph x and y column references. Then you could express your dropoff/pickup example as:

canvas.points(df, ['pickup_x', 'dropoff_x'], ['dropoff_x', 'dropoff_y'])

and express the kind of problem described in #512 as:

canvas.line(df, 'x', ['col1', 'col2', 'col3', ...])

This would make it quite easy to work with wide data with a lot of columns you want to aggregate on. I'm sure internally this could also be made efficient.

@jbednar
Copy link
Member Author

jbednar commented Oct 26, 2017

Right, the example above was about wide data, and I agree that your proposed syntax would make such an example more convenient to do.

In general, though, we should be able to use weighted-color-average plots for any suitable data, including separate aggregate arrays from arbitrary sources (e.g. different dataframes altogether) or from the same source but with arbitrary custom xarray operations on each one before rendering. So we'd still need a standalone example or utility showing how to combine the separate aggregates into the data structure expected by shade().

@jbednar
Copy link
Member Author

jbednar commented Oct 26, 2017

Not that such an example should be difficult; I expect it's a one-liner; I just don't have time today to create it, hence the issue. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants