how to add legend in basic charts #3289

mattijn · 2023-12-20T14:38:16Z

Given a basic chart:

That is defined as follows:

import altair as alt
import pandas as pd

source = pd.DataFrame({
    'project': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'score': [25, 57, 23, 19, 8, 47, 8, 21, 5],
    'goal': [25, 47, 30, 27, 38, 19, 4, 16, 23]
})

bar = alt.Chart(source).mark_bar(color='lightgray').encode(
    x='project',
    y='score'
)

tick = alt.Chart(source).mark_tick(color='green', thickness=3).encode(
    x='project',
    y='goal'
)

(bar + tick)

How can I add a legend to clarify the used colors? Like this:

Currently this requires me to define an Altair specification as such:

bar = alt.Chart(source).mark_bar().encode(
    x='project',
    y='score',
    color=alt.Color('score_label:N').scale(range=['lightgray']).title('Legend')
).transform_calculate(score_label="'score'")

tick = alt.Chart(source).mark_tick(thickness=3).encode(
    x='project',
    y='goal',
    color=alt.Color('goal_label:N').scale(range=['green']).title(None)
).transform_calculate(goal_label="'goal'")

(bar + tick).resolve_scale(color='independent')

In my honest opinion, I think this is too much to ask from a user perspective.

I can understand we are bound by the Grammar of Graphics and I found this vega/vega-lite#3411 (comment) in the Vega-Lite repository on forcing legends.

We should show a legend if users use different colors in different layers (even if constant).

I strongly think we should not do that if users specify constant values for different layers.
Legend is a visualization of a scale, and in this case there is no scale. So it seems unprincipled to do so.

Without upsetting people, I feel we have to rethink the GoG in this regard. I do not feel going against the GoG if a syntax as such could be supported:

bar = alt.Chart(source).mark_bar(color='lightgray', label='score').encode(
    x='project',
    y='score'
)

tick = alt.Chart(source).mark_tick(color='green', thickness=3, label='goal').encode(
    x='project',
    y='goal'
)

(bar + tick)

I'm also open to other approaches that are easier to implement than the current working spec above.

mattijn · 2023-12-20T15:40:48Z

Or as is mentioned in the same vega/vega-lite#3411 (comment),

once we support datum, then we would have scales

Than this could/should work:

# alt.Color(datum='score').scale(range=['lightgray'])
alt.ColorDatum('score').scale(range=['lightgray'])

But this returns currently in an error stating that datum has no scale:

AttributeError: 'ColorDatum' object has no attribute 'scale'

joelostblom · 2023-12-20T16:44:35Z

I'm generally in favor of adding the encoding channel options methods to the datum classes (and the value classes? or would they not be needed/relevant there?). I think that would make them consistent with how the channels (alt.Color, etc) work,and it would allow us to do alt.ColorDatum().scale() etc. This would probably not work with just alt.daum since that returns a dictionary, but maybe that would be a good opportunity to favor alt.ColorDatum over alt.datum in the docs as well as we started discussing elsewhere.

I would like to avoid adding something like a label parameter inside the mark if possible.

mattijn · 2023-12-20T17:00:28Z

You are right that this cannot work with alt.datum.

Btw:

This sets color (without control on legend):

alt.Color(value='lightgray')

This adds a legend (without control on actual color)

alt.Color(datum='score')

But combining them errors:

alt.Color(value='lightgray', datum='score')

joelostblom · 2023-12-20T18:39:54Z

It would be nice to allow for

alt.Color(datum='score').scale(...)

But it seems like we run into the same error as in #2913 (comment)

debbes80 · 2023-12-21T21:48:11Z

Setting the color can be done with configure_range, but still not the most convenient solution. I cannot find a similar solution for the shape (only works with mark_point). It is either circle or square (weirdly when changing the shape of the mark_tick to anything but 'circle').

bar = alt.Chart(source).mark_bar().encode(
    x='project:N',
    y='score:Q',
    color=alt.datum('Score'),
    # shape=alt.ShapeDatum('Score')
)

tick = alt.Chart(source).mark_tick(thickness=3, shape='wedge').encode(
    x='project:N',
    y='goal:Q',
    color=alt.datum('Goal'),
    # shape=alt.ShapeDatum('Goal')
)

(bar + tick).configure_range(
    category=['lightgray', 'green'],
    # category=alt.RangeScheme(scheme='category10'),  # scheme
    # symbol=['square','stroke'],
)

binste · 2023-12-23T14:03:14Z

On first glance, it seems to me that the issue arises as the original dataframe is not in a proper long format. Hence, you're trying to do a color scale across columns and then have to manually encode it.

If you reshape the dataframe, the spec gets easier:

source_long = source.melt("project", var_name="variable", value_name="value")

base = alt.Chart(source_long).encode(
    x="project",
    y="value",
    color=alt.Color("variable:N").scale(
        domain=["score", "goal"], range=["lightgray", "green"]
    ),
)
base.mark_bar().transform_filter(alt.datum.variable == "score") + base.mark_tick(
    thickness=3
).transform_filter(alt.datum.variable == "goal")

Somewhat unrelated but to also adjust the legend symbols, one could do either:

scale_kwargs = dict(domain=["score", "goal"], range=["lightgray", "green"])

base = alt.Chart(source_long).encode(
    x="project",
    y="value",
    stroke=alt.Stroke("variable:N", scale=alt.Scale(**scale_kwargs)),
    color=alt.Color("variable:N")
    .scale(**scale_kwargs)
    .legend(
        symbolType=alt.expr(alt.expr.if_(alt.datum.label == "goal", "stroke", "square"))
    ),
)
base.mark_bar().transform_filter(
    alt.datum.variable == "score"
) + base.mark_tick().transform_filter(alt.datum.variable == "goal")

Using stroke as well to encode the color is necessary as else the stroke legend symbol is just transparent (or white?). As an alternative, you could also manually specify it with symbolStrokeColor. The following is the same as above but without the alt.Stroke encoding and instead symbolStrokeColor set on legend:

base = alt.Chart(source_long).encode(
    x="project",
    y="value",
    color=alt.Color("variable:N")
    .scale(domain=["score", "goal"], range=["lightgray", "green"])
    .legend(
        symbolType=alt.expr(
            alt.expr.if_(alt.datum.label == "goal", "stroke", "square")
        ),
        symbolStrokeColor=alt.expr(
            alt.expr.if_(alt.datum.label == "goal", "green", "lightgray")
        ),
    ),
)
base.mark_bar().transform_filter(
    alt.datum.variable == "score"
) + base.mark_tick().transform_filter(alt.datum.variable == "goal")

mattijn · 2023-12-24T15:06:52Z

Thanks for all comments! Much appreciated. For people who are new to Altair/VL, this is still all very complex syntax, especially as it is one of the first thing people try to do.

I end up with the following suggestion (score and goal were different dataframes with a max aggregate over the score).

import altair as alt
import pandas as pd

source_goal = pd.DataFrame({
    'project': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'goal': [25, 47, 30, 27, 38, 19, 4, 16, 23]
})

source_score = pd.DataFrame({
    'project': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
    'score': [25, 57, 23, 25, 47, 30, 27, 38, 19, 4, 16, 23, 19, 8, 47, 8, 21, 5]
})

source_goal_long = source_goal.melt("project", var_name="variable", value_name="Goal")
source_score_long = source_score.melt("project", var_name="variable", value_name="Score")
source_score_long.sort_values('project').head()

	project	variable	Score
0	A	score	25
9	A	score	4
1	B	score	57
10	B	score	16
2	C	score	23

chart_bar = alt.Chart(source_score_long).mark_bar().encode(
    alt.X('project'), 
    alt.Y('max(Score):Q'), 
    alt.Color('variable').scale(range=["lightgray"])
)
chart_bar

chart_tick = alt.Chart(source_goal_long).mark_tick(thickness=3).encode(
    alt.X('project'), 
    alt.Y('Goal'), 
    alt.Color('variable').scale(range=['green']).legend(title=None)
)
chart_tick

(chart_bar + chart_tick).resolve_scale(color='independent')

(just writing it down so it hopefully enters the training data of these LLMs..)
Thanks again!

mattijn added the enhancement label Dec 20, 2023

mattijn closed this as completed Dec 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to add legend in basic charts #3289

how to add legend in basic charts #3289

mattijn commented Dec 20, 2023

mattijn commented Dec 20, 2023

joelostblom commented Dec 20, 2023

mattijn commented Dec 20, 2023

joelostblom commented Dec 20, 2023

debbes80 commented Dec 21, 2023

binste commented Dec 23, 2023

mattijn commented Dec 24, 2023

how to add legend in basic charts #3289

how to add legend in basic charts #3289

Comments

mattijn commented Dec 20, 2023

mattijn commented Dec 20, 2023

joelostblom commented Dec 20, 2023

mattijn commented Dec 20, 2023

joelostblom commented Dec 20, 2023

debbes80 commented Dec 21, 2023

binste commented Dec 23, 2023

mattijn commented Dec 24, 2023