Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to add legend in basic charts #3289

Closed
mattijn opened this issue Dec 20, 2023 · 7 comments
Closed

how to add legend in basic charts #3289

mattijn opened this issue Dec 20, 2023 · 7 comments

Comments

@mattijn
Copy link
Contributor

mattijn commented Dec 20, 2023

Given a basic chart:
image

That is defined as follows:

import altair as alt
import pandas as pd

source = pd.DataFrame({
    'project': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'score': [25, 57, 23, 19, 8, 47, 8, 21, 5],
    'goal': [25, 47, 30, 27, 38, 19, 4, 16, 23]
})

bar = alt.Chart(source).mark_bar(color='lightgray').encode(
    x='project',
    y='score'
)

tick = alt.Chart(source).mark_tick(color='green', thickness=3).encode(
    x='project',
    y='goal'
)

(bar + tick)

How can I add a legend to clarify the used colors? Like this:
image

Currently this requires me to define an Altair specification as such:

bar = alt.Chart(source).mark_bar().encode(
    x='project',
    y='score',
    color=alt.Color('score_label:N').scale(range=['lightgray']).title('Legend')
).transform_calculate(score_label="'score'")

tick = alt.Chart(source).mark_tick(thickness=3).encode(
    x='project',
    y='goal',
    color=alt.Color('goal_label:N').scale(range=['green']).title(None)
).transform_calculate(goal_label="'goal'")

(bar + tick).resolve_scale(color='independent')

In my honest opinion, I think this is too much to ask from a user perspective.

I can understand we are bound by the Grammar of Graphics and I found this vega/vega-lite#3411 (comment) in the Vega-Lite repository on forcing legends.

We should show a legend if users use different colors in different layers (even if constant).

I strongly think we should not do that if users specify constant values for different layers.
Legend is a visualization of a scale, and in this case there is no scale. So it seems unprincipled to do so.

Without upsetting people, I feel we have to rethink the GoG in this regard. I do not feel going against the GoG if a syntax as such could be supported:

bar = alt.Chart(source).mark_bar(color='lightgray', label='score').encode(
    x='project',
    y='score'
)

tick = alt.Chart(source).mark_tick(color='green', thickness=3, label='goal').encode(
    x='project',
    y='goal'
)

(bar + tick)

I'm also open to other approaches that are easier to implement than the current working spec above.

@mattijn
Copy link
Contributor Author

mattijn commented Dec 20, 2023

Or as is mentioned in the same vega/vega-lite#3411 (comment),

once we support datum, then we would have scales

Than this could/should work:

# alt.Color(datum='score').scale(range=['lightgray'])
alt.ColorDatum('score').scale(range=['lightgray'])

But this returns currently in an error stating that datum has no scale:

AttributeError: 'ColorDatum' object has no attribute 'scale'

@joelostblom
Copy link
Contributor

I'm generally in favor of adding the encoding channel options methods to the datum classes (and the value classes? or would they not be needed/relevant there?). I think that would make them consistent with how the channels (alt.Color, etc) work,and it would allow us to do alt.ColorDatum().scale() etc. This would probably not work with just alt.daum since that returns a dictionary, but maybe that would be a good opportunity to favor alt.ColorDatum over alt.datum in the docs as well as we started discussing elsewhere.

I would like to avoid adding something like a label parameter inside the mark if possible.

@mattijn
Copy link
Contributor Author

mattijn commented Dec 20, 2023

You are right that this cannot work with alt.datum.

Btw:

This sets color (without control on legend):

alt.Color(value='lightgray')

This adds a legend (without control on actual color)

alt.Color(datum='score')

But combining them errors:

alt.Color(value='lightgray', datum='score')

@joelostblom
Copy link
Contributor

It would be nice to allow for

alt.Color(datum='score').scale(...)

But it seems like we run into the same error as in #2913 (comment)

@debbes80
Copy link

Setting the color can be done with configure_range, but still not the most convenient solution. I cannot find a similar solution for the shape (only works with mark_point). It is either circle or square (weirdly when changing the shape of the mark_tick to anything but 'circle').

bar = alt.Chart(source).mark_bar().encode(
    x='project:N',
    y='score:Q',
    color=alt.datum('Score'),
    # shape=alt.ShapeDatum('Score')
)

tick = alt.Chart(source).mark_tick(thickness=3, shape='wedge').encode(
    x='project:N',
    y='goal:Q',
    color=alt.datum('Goal'),
    # shape=alt.ShapeDatum('Goal')
)

(bar + tick).configure_range(
    category=['lightgray', 'green'],
    # category=alt.RangeScheme(scheme='category10'),  # scheme
    # symbol=['square','stroke'],
)

@binste
Copy link
Contributor

binste commented Dec 23, 2023

On first glance, it seems to me that the issue arises as the original dataframe is not in a proper long format. Hence, you're trying to do a color scale across columns and then have to manually encode it.

If you reshape the dataframe, the spec gets easier:

source_long = source.melt("project", var_name="variable", value_name="value")

base = alt.Chart(source_long).encode(
    x="project",
    y="value",
    color=alt.Color("variable:N").scale(
        domain=["score", "goal"], range=["lightgray", "green"]
    ),
)
base.mark_bar().transform_filter(alt.datum.variable == "score") + base.mark_tick(
    thickness=3
).transform_filter(alt.datum.variable == "goal")
image

Somewhat unrelated but to also adjust the legend symbols, one could do either:

scale_kwargs = dict(domain=["score", "goal"], range=["lightgray", "green"])

base = alt.Chart(source_long).encode(
    x="project",
    y="value",
    stroke=alt.Stroke("variable:N", scale=alt.Scale(**scale_kwargs)),
    color=alt.Color("variable:N")
    .scale(**scale_kwargs)
    .legend(
        symbolType=alt.expr(alt.expr.if_(alt.datum.label == "goal", "stroke", "square"))
    ),
)
base.mark_bar().transform_filter(
    alt.datum.variable == "score"
) + base.mark_tick().transform_filter(alt.datum.variable == "goal")
image

Using stroke as well to encode the color is necessary as else the stroke legend symbol is just transparent (or white?). As an alternative, you could also manually specify it with symbolStrokeColor. The following is the same as above but without the alt.Stroke encoding and instead symbolStrokeColor set on legend:

base = alt.Chart(source_long).encode(
    x="project",
    y="value",
    color=alt.Color("variable:N")
    .scale(domain=["score", "goal"], range=["lightgray", "green"])
    .legend(
        symbolType=alt.expr(
            alt.expr.if_(alt.datum.label == "goal", "stroke", "square")
        ),
        symbolStrokeColor=alt.expr(
            alt.expr.if_(alt.datum.label == "goal", "green", "lightgray")
        ),
    ),
)
base.mark_bar().transform_filter(
    alt.datum.variable == "score"
) + base.mark_tick().transform_filter(alt.datum.variable == "goal")

@mattijn
Copy link
Contributor Author

mattijn commented Dec 24, 2023

Thanks for all comments! Much appreciated. For people who are new to Altair/VL, this is still all very complex syntax, especially as it is one of the first thing people try to do.

I end up with the following suggestion (score and goal were different dataframes with a max aggregate over the score).

import altair as alt
import pandas as pd

source_goal = pd.DataFrame({
    'project': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'goal': [25, 47, 30, 27, 38, 19, 4, 16, 23]
})

source_score = pd.DataFrame({
    'project': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'score': [25, 57, 23, 25, 47, 30, 27, 38, 19, 4, 16, 23, 19, 8, 47, 8, 21, 5]
})

source_goal_long = source_goal.melt("project", var_name="variable", value_name="Goal")
source_score_long = source_score.melt("project", var_name="variable", value_name="Score")
source_score_long.sort_values('project').head()
project variable Score
0 A score 25
9 A score 4
1 B score 57
10 B score 16
2 C score 23
chart_bar = alt.Chart(source_score_long).mark_bar().encode(
    alt.X('project'), 
    alt.Y('max(Score):Q'), 
    alt.Color('variable').scale(range=["lightgray"])
)
chart_bar
chart_tick = alt.Chart(source_goal_long).mark_tick(thickness=3).encode(
    alt.X('project'), 
    alt.Y('Goal'), 
    alt.Color('variable').scale(range=['green']).legend(title=None)
)
chart_tick
(chart_bar + chart_tick).resolve_scale(color='independent')

(just writing it down so it hopefully enters the training data of these LLMs..)
Thanks again!

@mattijn mattijn closed this as completed Dec 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants