Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facet Cannot Be Correctly Sorted #8675

Open
PBI-David opened this issue Jan 28, 2023 · 11 comments
Open

Facet Cannot Be Correctly Sorted #8675

PBI-David opened this issue Jan 28, 2023 · 11 comments
Labels

Comments

@PBI-David
Copy link
Contributor

PBI-David commented Jan 28, 2023

I realised we don't have a VL example of the Vega calendar so I tried to create one to submit as a PR. Everything is working fine apart from years with incomplete months cannot be sorted. Observe that Jan, Feb, Mar, Apr for 2020 are all pushed to the end of the chart and listed under Sep, Oct, Nov & Dec. I have tried various documented sort operations but this seems like a bug.

image

Editor link

@PBI-David
Copy link
Contributor Author

Possibly related to #5937 as this looks like a sort problem when gaps are present.

@NickCrews
Copy link

NickCrews commented Aug 28, 2023

I think I found the same thing. Using altair:

import altair as alt
from vega_datasets import data

source = data.cars()

chart = alt.Chart(source).mark_point().encode(
    x="Horsepower",
    y="Miles_per_Gallon",
    color="Origin",
).facet(
    row="Cylinders",
    column=alt.Column("Origin", sort=["USA", "Europe", "Japan"]),
)
chart.to_json()

Generates the following vega lite spec, where data points are placed in the incorrect column. If you remove the sort, then the points are placed in the correct column.
Open the Chart in the Vega Editor

@NickCrews
Copy link

A total hack workaround is to pad the values with unicode "zero width space" characters, so their natural sort order is what you want, but they still display as normal:

import altair as alt
from vega_datasets import data

source = data.cars()

# Default order is alphabetical, so would be Europe, Japan, USA
origins_sorted = [
    "USA",
    "Europe",
    "Japan",
]


def make_sortable(values, sort_order):
    # Use zero width space unicode char to pad the values
    # From https://superuser.com/questions/1590069/are-there-special-characters-that-can-be-used-for-sorting-but-not-displaying
    m = {orig: (i * "\u200b") + orig for i, orig in enumerate(origins_sorted)}
    return values.replace(m)


source = source.assign(Origin2=make_sortable(source.Origin, origins_sorted))
# now Origin2 is "USA", "\u200bEurope", "\u200b\u200bJapan"

chart = (
    alt.Chart(source, width=100, height=100)
    .mark_point()
    .encode(
        x="Horsepower",
        y="Miles_per_Gallon",
        color=alt.Color(
            "Origin",
            # Optional: make the legend for the color consistent with the facets
            sort=origins_sorted,
        ),
    )
    .facet(
        row="Cylinders",
        column="Origin2",
    )
)
chart
image

@apb-reports
Copy link

Any news on this Vega Team?

Still a bug as you can see here. The only solution I have found is to push in fake data into the data set so every row and column has data. But this shouldn't be necessary.

{
  "data": {"url": "data/cars.json"},
  "mark": "bar",
  "transform": [
    {
      "filter": "datum.Origin === 'Japan' || datum.Origin === 'Europe'"
    },
    {
      "filter": "datum.Horsepower >= 110"
    },
    {
      "joinaggregate": [{"op": "count", "field": "Name", "as": "CountOrigin"}],
      "groupby": ["Origin"]
    },
    {
      "calculate": "slice('000000' + format(datum.CountOrigin, '.0f'), -6) + '-' + datum.Origin",
      "as": "OriginSort"
    },
    {
      "joinaggregate": [
        {"op": "count", "field": "Name", "as": "CountCylinders"}
      ],
      "groupby": ["Cylinders"]
    },
    {
      "calculate": "slice('000000' + format(datum.CountCylinders, '.0f'), -6) + '-' + format(datum.Cylinders, '.0f')",
      "as": "CylindersSort"
    }
  ],
  "encoding": {
    "y": {"aggregate": "count", "field": "Name", "type": "quantitative"},
    "row": {
      "field": "Origin",
      "type": "nominal",
      "sort": {"field": "OriginSort", "order": "descending"}
    },
    "column": {
      "field": "Cylinders",
      "type": "quantitative",
      "sort": {"field": "CylindersSort", "order": "descending"}
    },
    "tooltip": [
      {"field": "Name"},
      {"field": "Origin"},
      {"field": "OriginSort"},
      {"field": "CylindersSort"}
    ],
    "color": {"field": "Horsepower", "type": "ordinal"}
  }
}

@PBI-David
Copy link
Contributor Author

I managed to fix the sorting on this by using a number and then a label expression. There is definitely still a bug here although my work around solves my use case. @domoritz , let me know if you want this in the examples.

Link

visualization

@coltnz
Copy link

coltnz commented Aug 27, 2024

padding with zeros fixes

@domoritz
Copy link
Member

domoritz commented Sep 5, 2024

padding with zeros fixes

Padding what with zeros fixes what? Do you mean that the incorrect assignment in the facets is fixed when you add entries for the missing combinations?

@coltnz
Copy link

coltnz commented Sep 5, 2024

Yes. In fact i just add a zero-ed entry for every cell and dont bother to work out the missing.

@coltnz
Copy link

coltnz commented Sep 5, 2024

I think the fix might be:

add a column_index like the row_index in the compiled vega above:

        {
          "type": "formula",
          "expr": "datum[\"Station Family\"]===\"Primary\" ? 0 : datum[\"Station Family\"]===\"STV\" ? 1 : 2",
          "as": "row_Station Family_sort_index"
        },

the column_index can then be used in the cell function:

 {
      "name": "cell",
      "type": "group",
      "style": "cell",
      "from": {
        "facet": {
          "name": "facet",
          "data": "data_0",
          "groupby": ["Station Family", "Share Scope"],
          "aggregate": {
            "cross": true,
            "fields": ["row_Station Family_sort_index"],
            "ops": ["max"],
            "as": ["row_Station Family_sort_index"]
          }
        }
      },
      "sort": {
        "field": [
          "datum[\"row_Station Family_sort_index\"]",
          "datum[\"Share Scope\"]"
        ],
        "order": ["ascending", "ascending"]
      },

@domoritz
Copy link
Member

domoritz commented Sep 5, 2024

That's good idea. Do you nee a good way to do this without knowing the values? Note that Vega-Lite doesn't see the data so we need to generate generic Vega that doesn't depend on specific values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants