Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel Categories (parcats) trace type for multi dimensional categorical data #2963

Merged
merged 51 commits into from
Oct 1, 2018

Conversation

jonmmease
Copy link
Contributor

@jonmmease jonmmease commented Sep 1, 2018

Continuation of jonmmease#1, now against master in the plotly.js repo.

I believe I addressed all of the outstanding code comments brought up in the old PR. Below is the original post, and my last few comments on the implementation.


@alexcjohnson @etpinard @monfera @chriddyp @jackparmer

Introduction

This PR is a proposal and an implementation of a new trace type for the interactive exploration of multi-dimensional categorical data sets. My working name for the trace is "Parallel Categories" or parcats for short.

The concept of this trace has been discussed previously in the following plotly.js issues:

I also briefly showed a prototype of this diagram to @chriddyp over screenshare several months ago.

Related work

The closest prior art to the Parallel Categories Diagram is the Parallel Sets Diagram by Robert Kosara and Caroline Ziemkiewicz.

Parallel Sets implementations / descriptions

Here are a collection of existing implementations / descriptions of the Parallel Sets Diagram

Parallel Sets Java Program

https://eagereyes.org/parallel-sets

This is a stand-alone Java program by Kosara that implements a Parallel Sets Diagram

parsets-titanic-dimensions-thumb

Parallel Sets from the DataViz catalog

https://datavizcatalogue.com/methods/parallel_sets.html

parallel_sets

D3 implementation of Parallel Sets

https://www.jasondavies.com/parallel-sets/

screen shot 2018-07-27 at 1 55 59 pm

What's different about the Parallel Categories Diagram?

The primary difference between this Parallel Categories Diagram (parcats from here on) and the Parallel Sets Diagram (parsets from each on) is that the parcats diagram supports a more flexible path coloring scheme.

In all of the examples of parsets diagrams that I have found, the colors of the paths correspond to states in the left-most (or top-most) dimension. In contrast, for the parcats diagram, color may correspond to a column in the dataset that may or may not be present as a dimensions in the diagram.

This, admittedly modest, extension has several advantages. Path colors may be set using a numeric array and a color map just like many other plotly.js trace types (scatter, parcoords, etc.). This makes it possible to use the parcats diagram combined with other traces in brushing/crossfiltering configurations.

Dragging and Brushing example

Here is an example of visualizing a 5-dimensional data set with two continuous dimensions and 3 categorical dimensions. This is accomplished by displaying the two continuous dimensions in a 2D scatter plot and the 3 categorical dimensions using the parcats diagram.

I created this example using a branch of plotly.py version 3 built against this branch of plotly.js.

First I show the drag interactions supported by the diagram. Categories (the rectangles) and dimensions labels (dimensions are the columns of rectangles) can be dragged to reorder categories and dimensions. Upon release, the diagram animates to a relaxed state with equal spacing between dimensions and categories.

Selection events in the scatter plot are use to update the colors of both the selected points in the scatter plot, and the corresponding paths in the parcats diagram. Similarly, click events on categories and paths in the parcats diagram are used to update the colors in both diagrams.

parcats_brushing

As far as I'm aware, this is the only visualization of multi-dimensional categorical data that supports this kind of two-way data brushing. And, combined with plotly.py version 3, it is certainly the only visualization of this type that would be easily accessible to Python users.

Color bundling

There are two modes for how the colors of paths are arranged.

In the example above, color is not considered when sorting the paths. This is desirable in a brushing scenario so that the paths remain stable as the colors change during interactions. This behavior is specified by setting the bundlecolors property to false.

Setting the bundlecolors property to true causes paths with like colors to be bundled together as they pass through each category. This results in a cleaner looking diagram and is preferable in cases where the positions of paths do not need to remain stable as colors change.

For example:
parcats_bundlecolors

Mocks

Several simple mocks have been added as a part of the current test suite.

parcats_basic
screen shot 2018-07-27 at 2 28 47 pm

parcats_bundled
screen shot 2018-07-27 at 2 28 53 pm

parcats_unbundled
screen shot 2018-07-27 at 2 29 01 pm

API notes

I tried to model the API as closely as possible after existing trace conventions. There is a top-level dimensions property with label and values sub-properties just as with the parcats trace. Path colors/colorscales are specified under a dimension.marker parent property.

Alternative approach

In the issues cited at the beginning of this PR there was some discussion on the possibility of adding categorical support to the existing Parallel Coordinates Diagram. This diagram was already well under development for our internal needs at the time of these discussions, so I did not pursue this approach.

TODO

Some items that I know still need to be done

  • Font styling support
  • Complete attribute descriptions
  • Complete the test suite. In terms of my personal testing standards I'd estimate that the test suite is about 50% complete.
  • Examples!

Request for comments

So the top-level question for the plotly.js team is, are you all interested in having this diagram be part of plotly.js? It's not the most common use-case, but I think it would be another differentiating feature for the plotly ecosystem.

If you all are interested, I have internal funding to put a bunch more time into this through September. And if we can get it merged in during that time, I can continue helping out with basic maintenance after that.

Let me know what you think!


I just added a mock that demonstrates the color hovermode (parcats_hovermode_color).

The basic idea here is that when you hover on a category, only the paths of a single color are highlighted. What's really useful about this is that the tooltip can then display the absolute probability of the paths of that color that pass through a given category. You can also display the conditional probabilities (Probability of blue given category A, probability of category A given blue).

hovermode_color

Does hovermode of being an enumeration of none, category, or color seem like a reasonable way to specify this?


@alexcjohnson
Regarding dragging, I went ahead and added a sankey style arrangement property to control the dragging behavior. There are three modes (names taken from sankey.arrangement)

  • perpendicular (now the default): categories only drag vertically,
    dimension labels drag horizontally.
  • freeform: category labels can drag vertically and horizontally
    (in which case they pull the dimension along with them). Here
    dragging a category can reorder the categories and dimensions.
  • fixed: dragging of dimensions and categories is disabled.

@alexcjohnson
I took a look back through the multi-label hover logic, and added a new 'dimension' hover mode to show it off. This hover mode will display a label for each category in the current dimension (See parcats_hovermode_dimension mock).

screen shot 2018-08-20 at 2 28 08 pm

If you notice the hover label on B, you can see how the label is pushed downward to keep it from overlapping with the label for C. It might be nice in some cases to also push labels upward to avoid collisions, but at the moment this only pushes things down.

Now that there are multiple hovermodes and configurable hoverinfo I do like this mode as an option.


Font support added for dimension labels (labelfont) and category labels (categorylabelfont). labelfont matches the corresponding property name in parcoords.

Jon M. Mease and others added 25 commits July 27, 2018 12:47
This was a relic of an older attempt to display a tooltip per color for the hover node.
It worked, but was pretty unwieldy.
Renamed shape categories to `linear` and `hspline` and made `linear` the default.
(property isn't wired up properly yet)
More consistent with other traces, and now it's possible to display
only probabilities, only counts, both, none (with hover effects),
or skip (not hover effects).
There are three arrangement modes:
 - `perpendicular` (default): categories only drag vertically,
    dimension labels drag horizontally.
 - `freeform`: category labels can drag vertically and horizontally
    (in which case they pull the dimension along with them). Here
    dragging a category can reorder the categories and dimensions.
 - `fixed`: dragging of dimensions and categories is disabled.
Not working yet, just a checkpoint
to the control font of dimension labels and category labels respectively
Makes the hoverlabel shifting logic more noticeable and shows off
the `counts` attribute
Now there are tests for 'freeform', 'perpendicular', and 'fixed'
arrangements for dragging the dimension label and category rectangle.
@alexcjohnson
Copy link
Collaborator

@jonmmease this is looking great! Aside from the comments above (all minor and straightforward, I think) the only thing I'd like to see is a mock that puts two parcats traces side-by-side, to verify that this works. Can you just replace two of the existing mocks with one combined mock? Maybe even using layout.grid, looks as though that's plumbed up correctly but would be nice to 🔒 it down!

Jon M. Mease added 3 commits September 28, 2018 13:02
This combines the former colorbar and font mocks. And adds a parcats
trace with a Latex category label as well.
@jonmmease
Copy link
Contributor Author

Thanks @alexcjohnson! I really appreciate your time on this.

In 4117612 I merged the font and colorbar mocks into parcats_grid_subplots, which displays 4 parcats traces in a 2x2 grid using layout.grid subplots. I was pretty happy to see that this just worked 🙂

I also took the opportunity to throw in a parcats trace with pseudo-HTML and MathJax category labels.

parcats_grid_subplots

Let me know if anything else comes to mind!

@alexcjohnson
Copy link
Collaborator

Fantastic grid mock @jonmmease! Good idea to include pseudo-html. And I'm glad that it just worked ™️ 🎉 Having multiple traces also shows off that the default coloring behavior, which I hadn't noticed before, is to pull from the trace color sequence - hence the orange and green coloration when you don't specify a color. Is that really what we want, or would it be better to use 'lightgray' as you had in the old code (that wasn't being used anyway because you had coerced line.color)? parcats traces don't share subplots, which is normally the reason to pull trace colors from a sequence.

@jonmmease
Copy link
Contributor Author

That's a good point and I don't have a strong preference. The grey felt like a pretty boring default and I thought our default blue was a nicer starting point. I also liked that the default colors could be specified in a template with the colorway property, but perhaps that's overloading colorway too much.

In any case, if you have a preference for a different default I'm happy to make a change 🙂

@nicolaskruchten
Copy link
Contributor

Random last-minute thought: is "parcats" a more easily-grokked/commonly-used name than something like "alluvial diagram" ? I would maybe consider renaming this trace :)

@jonmmease
Copy link
Contributor Author

From my reading "alluvial diagram" is most often used interchangeably with sankey diagram. For example Wikipedia and datavizproject both define it to be essentially equivalent to our sankey diagram.

@nicolaskruchten
Copy link
Contributor

OK.

@alexcjohnson
Copy link
Collaborator

The grey felt like a pretty boring default and I thought our default blue was a nicer starting point.

That makes sense, why don't we just use layout.colorway[0] though, so all parcats traces get the same default color. I think that's the last item, then we'll be ready to merge!

Looking at the examples you showed in the head of this PR, they include a feature we don't, which is categorical coloring based on the first dimension. Not needed for this PR though - I'll make another new issue.

@jonmmease
Copy link
Contributor Author

Sounds good. I'll do that tonight. What would be the best way to get at colorway from inside supplyDefaults?

@alexcjohnson
Copy link
Collaborator

What would be the best way to get at colorway from inside supplyDefaults?

layout is there already as an arg to handleLineDefaults - you should be able to just grab layout.colorway[0].

@jonmmease
Copy link
Contributor Author

Alright, I just pushed the colorway[0] change. Almost there!

@alexcjohnson
Copy link
Collaborator

Beautiful! Yeah, to my eye a consistent blue (when that's the head of the colorway) is definitely better.

Dunno if @etpinard wanted to take a last look at this, but from my side it's ready to go! 💃

@etpinard
Copy link
Contributor

etpinard commented Oct 1, 2018

I apologise if this might have been discussed before, but on

image

and most baselines, why isn't the parcats trace centered?

@jonmmease
Copy link
Contributor Author

The domain in most of the mocks is set to "domain": {"x": [0.125, 0.625],"y": [0.25, 0.75]}. I made it a bit off center in x to test that the resulting SVG geometry lines up where it should. This geometry is checked explicitly in a couple of tests in parcats_test.js. Then I stuck with the geometry for the rest of the mocks to have a consistent frame for computing mouse interaction locations.

@etpinard
Copy link
Contributor

etpinard commented Oct 1, 2018

"domain": {"x": [0.125, 0.625],"y": [0.25, 0.75]}

Wow. I totally missed that when 👁️ the mock JSONs. My bad.

Let's merge this thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants