Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of ordinal based on pandas' ordered Categorical type? #245

Closed
pierre-haessig opened this issue Oct 24, 2016 · 17 comments · Fixed by #2522
Closed

Support of ordinal based on pandas' ordered Categorical type? #245

pierre-haessig opened this issue Oct 24, 2016 · 17 comments · Fixed by #2522

Comments

@pierre-haessig
Copy link
Contributor

I've just started to play with altair, using the diamonds dataset. Here is the notebook to clarify what I did https://gist.github.com/pierre-haessig/09fa9268aa0a0e7d91356f681f96ca18

Since, I'm not familiar with altair, I maybe missed something, but I've got the feeling that ordered Categorical types from pandas are not supported.

Indeed, if I use a color='cut' encoding, when cut is a pandas Series with an ordered category dtype, I get by default a nominal type of coloring (with "unordered" colors).

On the other hand, if I force the use of ordered with color='cut:O', I indeed get the ordered colored (the shades of green), but the order is wrong! (I get Fair, Good, Ideal, Premium, Very Good, while the correct order is 'Fair', 'Good', 'Very Good', 'Premium', 'Ideal', as manually defined in pandas' category)

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 24, 2016

Hi, thanks for the report!

Currently, because of a bug in behavior for categoricals in Pandas' to_json function, altair first converts categorical types to strings before serializing a dataframe as JSON. Thus Altair only knows about alphabetical ordering.

This is something that we should figure out how to address. In the meantime, you can specify the order manually within the encoding with, e.g. x=X('cut:O', scale=Scale(domain=['Fair', 'Good', 'Very Good', 'Premium', 'Ideal']) I haven't tested that, but I think it should work.

@pierre-haessig
Copy link
Contributor Author

@jakevdp thanks for the feedback. I had to adapt your suggestion, since it's the color I wants, rather than x.

this is what I end up with:

c = Chart(data_samp)
cut_cat = ['Fair', 'Good', 'Very Good', 'Premium', 'Ideal']
cut_scale = Scale(domain=cut_cat, type='ordinal')
c.mark_circle().encode(x='carat', y='price',
                       color=Color('cut:O', scale=cut_scale))

so there are two remaining issues:

  1. I shouldn't have to specify that the scale is ordinal if I already say it in the data encoding. However, if I remove type='ordinal' from the def of Scale, I get black marks...
  2. After specifying type='ordinal', I get the shades of green, but the order is really messy (i.e.: order in the legend is back to alphabetical, and color order is mysterious), see plot below.

vega
!

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 25, 2016

Huh... that's not great. I'm not certain why Vega-Lite requires you to specify 'ordinal' in both places, or why the color order is so strange. Maybe @kanitw would have ideas?

@jakevdp jakevdp added the bug label Nov 2, 2016
dhimmel added a commit to dhimmel/biorxiv-licenses that referenced this issue Nov 28, 2016
Workaround for color ordering vega/altair#245 is
imperfect.
@dhimmel
Copy link

dhimmel commented Nov 28, 2016

Having the same issue. See this notebook where I use nominal ordering as a workaround to the color issue:

vega

Another issue arises where the legend ordering doesn't match the ordering in the plot. The legend is in the right order, but the area marks are incorrectly ordered.

@jakevdp
Copy link
Collaborator

jakevdp commented Nov 28, 2016

I think this would be worth posting as a bug to Vega-Lite itself.

@kanitw
Copy link
Member

kanitw commented Nov 29, 2016

@pierre-haessig -- Thanks for reporting. This is definitely a bug.

I just created vega/vega-lite#1732.

The issue contains a workaround for this: using nominal type like @dhimmel suggests and set custom color range manually. (You might find colorbrewer useful.)

We will make sure to fix this for the 2.0 release. (We probably won't fix this in 1.x since it should be very easy to fix this in Vega 3, but quite complicated to do this in Vega 2. Since we have a temporary workaround, we will focus our efforts on 2.0 development.)

@dhimmel
Copy link

dhimmel commented Nov 29, 2016

@kanitw I posted vega/vega-lite#1732 (comment) before I saw the previous comment.

The issue contains a workaround for this: using nominal type like @dhimmel suggests

It's not quite a workaround because the marks (bands) are not in the right order? Is there a way to fix that?

@kanitw
Copy link
Member

kanitw commented Nov 29, 2016

For other people following this issue, here is a workaround for the following question.

It's not quite a workaround because the marks (bands) are not in the right order? Is there a way to fix that?

@ellisonbg ellisonbg added this to the 1.3 milestone Sep 27, 2017
@ellisonbg ellisonbg changed the title support of ordinal based on pandas' ordered Categorical type? Support of ordinal based on pandas' ordered Categorical type? Sep 27, 2017
@dsaxton
Copy link

dsaxton commented May 24, 2020

FWIW the to_json segfault issue for Categorical dtype appears to be fixed in pandas: pandas-dev/pandas#12802

@joelostblom
Copy link
Contributor

@dsaxton Did you PR work for using pandas categoricals? I would love this functionality i Altair and it seems promising that the pandas json bug has been fixed!

@dsaxton
Copy link

dsaxton commented Sep 26, 2020

@dsaxton Did you PR work for using pandas categoricals? I would love this functionality i Altair and it seems promising that the pandas json bug has been fixed!

It seemed to be working, although I didn't do any testing outside of Altair's CI

@sadzart
Copy link

sadzart commented Dec 17, 2020

Just to add another example where this issue is limiting.

The Stacked Bar Chart example with Sorted Segments in the Example's gallery doesn't sort if one uses custom ordering on a pandas categorical data type.

from vega_datasets import data

source = data.barley()

# custom ordering of categorical data type
site_lst = ['Crookston', 'Morris', 'University Farm','Duluth', 'Grand Rapids', 'Waseca']
source.site = pd.Categorical(source.site, site_lst, ordered = True)

# The stacks are not ordered according to the 'site' variable ordering
# They use alphabetical sort by default and I have no idea how to alter this to work.
alt.Chart(source).mark_bar().encode(
    x='sum(yield)',
    y='variety',
    color='site',
    order=alt.Order(
      # Sort the segments of the bars by this field
      'site',
      sort= 'ascending'  
    )
)

image

Things I have tried to no avail:

  1. I tried using sort = None as an argument for alt.Color but that doesn't work for correcting the stack either.
  2. alt.Order doesn't accept a list; only accepts "ascending" or "descending"
  3. using the sort argument with the list site_lst in alt.X doesn't work either

@mattijn
Copy link
Contributor

mattijn commented Dec 19, 2020

@sadzart Custom sorting is possible, by accessing undocumented created new fields during compilation.. probably not recommended. (related to vega/vega-lite#1734 (comment))

import altair as alt
import pandas as pd
from vega_datasets import data

source = data.barley()

# custom ordering of categorical data type
site_lst = ['Crookston', 'Morris', 'University Farm','Duluth', 'Grand Rapids', 'Waseca']
source.site = pd.Categorical(source.site, site_lst, ordered = True)

alt.Chart(source).mark_bar().encode(
    x='sum(yield)',
    y='variety',
    color=alt.Color('site', sort=alt.Sort(site_lst)),
    order=alt.Order('color_site_sort_index:Q',
      sort='ascending'  
    )
)

image

By introducing a sort for the color channel, the color_site_sort_index is added as a new field/column in the dataset. You'll have to use :Q to assign type manually in the order channel, since Altair cannot defer the type of a yet unexisting field.

To observe what happens to your chart you can inspect the data-viewer in the Vega editor.
Open the Chart in the Vega Editor

@sadzart
Copy link

sadzart commented Dec 19, 2020

@mattijn Worked like a charm. Thanks

@firasm
Copy link

firasm commented Mar 8, 2021

That's a really neat solution @mattijn - but I'm having trouble with my dataset. I can change the legend order, but the order on the actual plot doesn't change.

Here's a minimal example:

import altair as alt
import pandas as pd

dfdict = {
    "Questions": {
        0: "Question Text",
        1: "Question Text",
        2: "Question Text",
        3: "Question Text",
        4: "Question Text",
        5: "Question Text",
    },
    "level": {
        0: "1 - Strongly Disagree",
        1: "2 - Disagree",
        2: "3 - Neutral",
        3: "4 - Agree",
        4: "5 - Strongly Agree",
        5: "N/A",
    },
    "value": {0: 1.4, 1: 5.7, 2: 10.0, 3: 32.9, 4: 47.1, 5: 2.9},
}

df = pd.DataFrame(dfdict)

sort_order = [
    "N/A",
    "5 - Strongly Agree",
    "4 - Agree",
    "3 - Neutral",
    "2 - Disagree",
    "1 - Strongly Disagree",
]

# This doesn't seem to be needed
df["level"] = pd.Categorical(df["level"], sort_order, ordered=True)

chart = (
    alt.Chart(df)
    .mark_bar()
    .encode(
        x=alt.X("value"),
        y=alt.Y("Questions", title=""),
        color=alt.Color("level", sort=alt.Sort(sort_order)),
        order=alt.Order("color_site_sort_index:Q"),
    )
)

chart

Screen Shot 2021-03-07 at 5 32 25 PM

Ideal order would be - from left to right - N/A, 5 - Strongly Agree, 4 - Agree, etc... What I am able to accomplish is my ideal order from right to left. Inverting the sort_order list just changes the legend order, but the order the data is plotted.

FYI: I know this is an issue with my plot because I can reproduce your example just fine.

Thanks in advance

@joelostblom
Copy link
Contributor

joelostblom commented Mar 8, 2021

The syntax of the undocumented feature is "color_<column-name>_sort_index", so if you change your example to "color_level_sort_index:O", it works.

@firasm
Copy link

firasm commented Mar 8, 2021

Whoops!! Good catch!

Thanks that works.

Full code for someone looking to reproduce:

import altair as alt
import pandas as pd

dfdict = {
    "Questions": {
        0: "Question Text",
        1: "Question Text",
        2: "Question Text",
        3: "Question Text",
        4: "Question Text",
        5: "Question Text",
    },
    "level": {
        0: "1 - Strongly Disagree",
        1: "2 - Disagree",
        2: "3 - Neutral",
        3: "4 - Agree",
        4: "5 - Strongly Agree",
        5: "N/A",
    },
    "value": {0: 1.4, 1: 5.7, 2: 10.0, 3: 32.9, 4: 47.1, 5: 2.9},
}

df = pd.DataFrame(dfdict)

sort_order = [
    "N/A",
    "5 - Strongly Agree",
    "4 - Agree",
    "3 - Neutral",
    "2 - Disagree",
    "1 - Strongly Disagree",
]

# This doesn't seem to be needed
df["level"] = pd.Categorical(df["level"], sort_order, ordered=True)

chart = (
    alt.Chart(df)
    .mark_bar()
    .encode(
        x=alt.X("value"),
        y=alt.Y("Questions", title=""),
        color=alt.Color("level", sort=alt.Sort(sort_order)),
        order=alt.Order("color_level_sort_index:Q"),
    )
)

chart

Screen Shot 2021-03-07 at 7 24 42 PM

andrewKOwong added a commit to andrewKOwong/clps_data that referenced this issue Jun 9, 2023
This is surprisingly weird. The official documentation actually
mentions a issue post that advises referencing an undocumented
internal variable from vega-lite.
Docs:
https://altair-viz.github.io/user_guide/encodings/channels.html#order
The issue referenced:
vega/altair#245 (comment)
I'm not quite sure how the alt.Order object works. But this
implementation works for now. I would like to carefully verify
or test this though.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants