Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row and column sorting do not work #5937

Open
sorig opened this issue Feb 19, 2020 · 9 comments
Open

Row and column sorting do not work #5937

sorig opened this issue Feb 19, 2020 · 9 comments
Assignees
Labels
Area - View Composition Bug 🐛 P2 Important Issues that should be fixed soon

Comments

@sorig
Copy link

sorig commented Feb 19, 2020

Sorting of row and column encodings seems to be broken for discrete data types. Here is an unsorted specification that is working as expected:

Unsorted spec

Custom sorting orders shuffle the rows and columns in a seemingly random order. In fact, the result doesn't change depending on the sort order, or even the contents of the custom sorting array passed (the result is the same if an empty array is passed).

Examples:

"Sorted" rows spec

"Sorted" columns spec

This could be related to #5366. However, in my example removing the aggregate count function doesn't change anything.

@sorig sorig added the Bug 🐛 label Feb 19, 2020
@domoritz
Copy link
Member

Can you send a minimal example that shows this issue?

@sorig
Copy link
Author

sorig commented Feb 20, 2020

@domoritz Thanks for getting back to me. Sorry, it's a bit hard to give you a minimal example because there seem to be multiple connected bugs here. Here is an attempt.

First of all, lets look at an example that displays correctly. You can play around with the example here:

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "mark": {"type": "bar"},
  "data": {
    "values": [
      {"A": "A1", "B": "B1"},
      {"A": "A2", "B": "B1"},
      {"A": "A2", "B": "B1"},
      {"A": "A3", "B": "B2"},
      {"A": "A3", "B": "B2"},
      {"A": "A3", "B": "B3"}
    ]
  },
  "encoding": {
    "column": {
      "field": "A",
      "type": "nominal"
    },
    "row": {
      "field": "B",
      "type": "nominal"
    }
  },
  "height": 75
}

Result
canvas

Now let's say I want the columns to appear in the order: A3, A1, A2.

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "mark": {"type": "bar"},
  "data": {
    "values": [
      {"A": "A1", "B": "B1"},
      {"A": "A2", "B": "B1"},
      {"A": "A2", "B": "B1"},
      {"A": "A3", "B": "B2"},
      {"A": "A3", "B": "B2"},
      {"A": "A3", "B": "B3"}
    ]
  },
  "encoding": {
    "column": {
      "field": "A",
      "type": "nominal",
      "sort": ["A3", "A1", "A2"]
    },
    "row": {
      "field": "B",
      "type": "nominal"
    }
  },
  "height": 75
}

Result
canvas

Notice how the chart no longer reflects the data. If you change the sort array, the column labels will update but the bars themselves will stay the same (i.e. not displaying correctly).

If you set the y of the bars to the data count, not even the column labels will change.

The same is true if you try to sort the rows

@sorig
Copy link
Author

sorig commented Feb 20, 2020

The issue also exists for quantitative data types

{
  "$schema": "https://vega.github.io/schema/vega-lite/v4.json",
  "mark": {"type": "bar"},
  "data": {
    "values": [
      {"A": 1, "B": 1},
      {"A": 2, "B": 1},
      {"A": 2, "B": 1},
      {"A": 3, "B": 2},
      {"A": 3, "B": 2},
      {"A": 3, "B": 3}
    ]
  },
  "encoding": {
    "column": {
      "field": "A",
      "type": "quantitative",
      "sort": [3, 1, 2]
    },
    "row": {
      "field": "B",
      "type": "quantitative"
    }
  },
  "height": 75
}

Result
canvas

@sorig sorig changed the title Row and column sorting do not work for nominal and ordinal types Row and column sorting do not work Feb 20, 2020
@domoritz
Copy link
Member

Thank you for making small examples. it really helps isolate issues and see whether this is a duplicate or a new issue.

@domoritz
Copy link
Member

If you want to help further debug this issue, could you take a look at the generated Vega and see whether you can find where Vega-Lite generates incorrect Vega?

@stilley2
Copy link

stilley2 commented Oct 2, 2021

This bug has bitten me, so I tried to do what @domoritz asked to look at where the vega code generated by @sorig 's example is wrong. Using https://vega.github.io/editor, the code generated by his second version is

{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "background": "white",
  "padding": 5,
  "data": [
    {
      "name": "source_0",
      "values": [
        {"A": "A1", "B": "B1"},
        {"A": "A2", "B": "B1"},
        {"A": "A2", "B": "B1"},
        {"A": "A3", "B": "B2"},
        {"A": "A3", "B": "B2"},
        {"A": "A3", "B": "B3"}
      ]
    },
    {
      "name": "data_0",
      "source": "source_0",
      "transform": [
        {
          "type": "formula",
          "expr": "datum[\"A\"]===\"A3\" ? 0 : datum[\"A\"]===\"A1\" ? 1 : datum[\"A\"]===\"A2\" ? 2 : 3",
          "as": "column_A_sort_index"
        }
      ]
    },
    {
      "name": "column_domain",
      "source": "data_0",
      "transform": [
        {
          "type": "aggregate",
          "groupby": ["A"],
          "fields": ["column_A_sort_index"],
          "ops": ["max"],
          "as": ["column_A_sort_index"]
        }
      ]
    },
    {
      "name": "row_domain",
      "source": "data_0",
      "transform": [{"type": "aggregate", "groupby": ["B"]}]
    }
  ],
  "signals": [
    {"name": "child_width", "value": 20},
    {"name": "child_height", "value": 75}
  ],
  "layout": {
    "padding": 20,
    "offset": {"rowTitle": 10, "columnTitle": 10},
    "columns": {"signal": "length(data('column_domain'))"},
    "bounds": "full",
    "align": "all"
  },
  "marks": [
    {
      "name": "row-title",
      "type": "group",
      "role": "row-title",
      "title": {
        "text": "B",
        "orient": "left",
        "style": "guide-title",
        "offset": 10
      }
    },
    {
      "name": "column-title",
      "type": "group",
      "role": "column-title",
      "title": {"text": "A", "style": "guide-title", "offset": 10}
    },
    {
      "name": "row_header",
      "type": "group",
      "role": "row-header",
      "from": {"data": "row_domain"},
      "sort": {"field": "datum[\"B\"]", "order": "ascending"},
      "title": {
        "text": {
          "signal": "isValid(parent[\"B\"]) ? parent[\"B\"] : \"\"+parent[\"B\"]"
        },
        "orient": "left",
        "style": "guide-label",
        "frame": "group",
        "offset": 10
      },
      "encode": {"update": {"height": {"signal": "child_height"}}}
    },
    {
      "name": "column_header",
      "type": "group",
      "role": "column-header",
      "from": {"data": "column_domain"},
      "sort": {"field": "datum[\"column_A_sort_index\"]", "order": "ascending"},
      "title": {
        "text": {
          "signal": "isValid(parent[\"A\"]) ? parent[\"A\"] : \"\"+parent[\"A\"]"
        },
        "style": "guide-label",
        "frame": "group",
        "offset": 10
      },
      "encode": {"update": {"width": {"signal": "child_width"}}}
    },
    {
      "name": "cell",
      "type": "group",
      "from": {
        "facet": {
          "name": "facet",
          "data": "data_0",
          "groupby": ["B", "A"],
          "aggregate": {
            "cross": true,
            "fields": ["column_A_sort_index"],
            "ops": ["max"],
            "as": ["column_A_sort_index"]
          }
        }
      },
      "sort": {
        "field": ["datum[\"B\"]", "datum[\"column_A_sort_index\"]"],
        "order": ["ascending", "ascending"]
      },
      "encode": {
        "update": {
          "width": {"signal": "child_width"},
          "height": {"signal": "child_height"}
        }
      },
      "marks": [
        {
          "name": "child_marks",
          "type": "rect",
          "style": ["bar"],
          "from": {"data": "facet"},
          "encode": {
            "update": {
              "fill": {"value": "#4c78a8"},
              "ariaRoleDescription": {"value": "bar"},
              "x": {"field": {"group": "width"}},
              "x2": {"value": 0},
              "y": {"value": 0},
              "y2": {"field": {"group": "height"}}
            }
          }
        }
      ]
    }
  ],
  "config": {}
}

For this particular case, I found that this patch fixes the plot

116,122c116,117
<           "groupby": ["B", "A"],
<           "aggregate": {
<             "cross": true,
<             "fields": ["column_A_sort_index"],
<             "ops": ["max"],
<             "as": ["column_A_sort_index"]
<           }
---
>           "groupby": ["B", "column_A_sort_index"],
>           "aggregate": {"cross": true}

@anbnyc
Copy link

anbnyc commented Apr 22, 2022

I think I am running into the same issue -- posting here in case it helps narrow down the cause or affects priority for fixing this: reproducible example in Vega-lite editor. I would expect all of the "B" cells to be in the first column and all of the "A" cells to be in the second column.

image

{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "values": [
      { "column": "B", "row": 0, "value": "banana"},
      { "column": "A", "row": 0, "value": "apple"},
      { "column": "C", "row": 0, "value": "carrot"},
      { "column": "B", "row": 1, "value": "blueberry"},
      { "column": "A", "row": 1, "value": "avocado"}
    ]
  },
  "facet": {
    "row": { "field": "row", "header": null },
    "column": { "field": "column", "title": null, "sort": ["B", "A", "C"] }
  },
  "spec": {
    "height": 50,
    "width": 50,
    "layer": [
      {
        "mark": { "type": "text" },
        "encoding": {
          "x": { "value": 25 },
          "y": { "value": 25 },
          "text": { "field": "value" }
        }
      }
    ]
  }
}

@broughtonjp
Copy link

broughtonjp commented Oct 25, 2022

I am experiencing what I think is the same error. I've noticed a few things that seem to be going on. I seem to get different results depending on whether I use sorting or a combination with aggregation.

This only seems to happen with data sets where faceting is used but there are missing groups.

To start with, an example that produces what I'd expect:

{
  "config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
  "data": {"name": "values"},
  "mark": "line",
  "encoding": {
    "color": {"field": "choice", "type": "nominal"},
    "column": {
      "field": "location",
      "type": "nominal"
    },
    "row": {"field": "choice", "type": "nominal"},
    "x": {"field": "time", "type": "quantitative"},
    "y": {"field": "value", "type": "quantitative"}
  },
  "height": 100,
  "width": 100,
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "datasets": {
    "values": [
      {"time": 0, "value": 0, "choice": "A", "location": "s1"},
      {"time": 1, "value": 5, "choice": "A", "location": "s1"},
      {"time": 0, "value": 0, "choice": "B", "location": "s2"},
      {"time": 1, "value": 5, "choice": "B", "location": "s2"},
      {"time": 0, "value": 0, "choice": "A", "location": "s2"},
      {"time": 1, "value": 5, "choice": "A", "location": "s2"},
      {"time": 0, "value": 0, "choice": "B", "location": "s3"},
      {"time": 1, "value": 5, "choice": "B", "location": "s3"}
    ]
  }
}

This generates the following plot where location s2 has values for both choice A+B.
image

Next, when I specify the order to be the same as in the original graph I get the data in an unexpected order:

{
  "config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
  "data": {"name": "values"},
  "mark": "line",
  "encoding": {
    "color": {"field": "choice", "type": "nominal"},
    "column": {
      "field": "location",
      "type": "nominal",
      "sort": ["s1", "s2", "s3"]
    },
    "row": {
      "field": "choice",
      "type": "nominal",
      "sort": ["A", "B"]
    },
    "x": {"field": "time", "type": "quantitative"},
    "y": {"field": "value", "type": "quantitative"}
  },
  "height": 100,
  "width": 100,
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "datasets": {
    "values": [
      {"time": 0, "value": 0, "choice": "A", "location": "s1"},
      {"time": 1, "value": 5, "choice": "A", "location": "s1"},
      {"time": 0, "value": 0, "choice": "B", "location": "s2"},
      {"time": 1, "value": 5, "choice": "B", "location": "s2"},
      {"time": 0, "value": 0, "choice": "A", "location": "s2"},
      {"time": 1, "value": 5, "choice": "A", "location": "s2"},
      {"time": 0, "value": 0, "choice": "B", "location": "s3"},
      {"time": 1, "value": 5, "choice": "B", "location": "s3"}
    ]
  }
}

Result of with sorting, no aggregation:
image

Next, when I turn on aggregation and specify the original order, I get an even different result:

{
  "config": {"view": {"continuousWidth": 400, "continuousHeight": 300}},
  "data": {"name": "values"},
  "mark": "line",
  "encoding": {
    "color": {"field": "choice", "type": "nominal"},
    "column": {
      "field": "location",
      "type": "nominal",
      "sort": ["s1", "s2", "s3"]
    },
    "row": {
      "field": "choice",
      "type": "nominal",
      "sort": ["A", "B"]
    },
    "x": {"field": "time", "type": "quantitative"},
    "y": {"aggregate": "mean", "field": "value", "type": "quantitative"}
  },
  "height": 100,
  "width": 100,
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "datasets": {
    "values": [
      {"time": 0, "value": 0, "choice": "A", "location": "s1"},
      {"time": 1, "value": 5, "choice": "A", "location": "s1"},
      {"time": 0, "value": 0, "choice": "B", "location": "s2"},
      {"time": 1, "value": 5, "choice": "B", "location": "s2"},
      {"time": 0, "value": 0, "choice": "A", "location": "s2"},
      {"time": 1, "value": 5, "choice": "A", "location": "s2"},
      {"time": 0, "value": 0, "choice": "B", "location": "s3"},
      {"time": 1, "value": 5, "choice": "B", "location": "s3"}
    ]
  }
}

Result of with aggregation and sorting:
image

@apb-reports
Copy link

Any news on this Vega Team?

Still a bug as you can see here. The only solution I have found is to push in fake data into the data set so every row and column has data. But this shouldn't be necessary.

{
  "data": {"url": "data/cars.json"},
  "mark": "bar",
  "transform": [
    {
      "filter": "datum.Origin === 'Japan' || datum.Origin === 'Europe'"
    },
    {
      "filter": "datum.Horsepower >= 110"
    },
    {
      "joinaggregate": [{"op": "count", "field": "Name", "as": "CountOrigin"}],
      "groupby": ["Origin"]
    },
    {
      "calculate": "slice('000000' + format(datum.CountOrigin, '.0f'), -6) + '-' + datum.Origin",
      "as": "OriginSort"
    },
    {
      "joinaggregate": [
        {"op": "count", "field": "Name", "as": "CountCylinders"}
      ],
      "groupby": ["Cylinders"]
    },
    {
      "calculate": "slice('000000' + format(datum.CountCylinders, '.0f'), -6) + '-' + format(datum.Cylinders, '.0f')",
      "as": "CylindersSort"
    }
  ],
  "encoding": {
    "y": {"aggregate": "count", "field": "Name", "type": "quantitative"},
    "row": {
      "field": "Origin",
      "type": "nominal",
      "sort": {"field": "OriginSort", "order": "descending"}
    },
    "column": {
      "field": "Cylinders",
      "type": "quantitative",
      "sort": {"field": "CylindersSort", "order": "descending"}
    },
    "tooltip": [
      {"field": "Name"},
      {"field": "Origin"},
      {"field": "OriginSort"},
      {"field": "CylindersSort"}
    ],
    "color": {"field": "Horsepower", "type": "ordinal"}
  }
}

A work around is to push fake data into the data source like this.
If I have a facet by Row (RES) and column (PLANT_NAME):

let predefinedPlantNames = [...new Set(myDataSource.map(item => item.PLANT_NAME))];

const allRES = [...new Set(myDataSource.map(item => item.RES))];

allRES.forEach(res => {
predefinedPlantNames.forEach(plantName => {
const combinationExists = myDataSource.some(item => item.RES === res && item.PLANT_NAME === plantName);
if (!combinationExists) {
myDataSource.push({
Id: 0,
RES: res,
PLANT_NAME: plantName
});
}
});
});

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - View Composition Bug 🐛 P2 Important Issues that should be fixed soon
Projects
None yet
Development

No branches or pull requests

7 participants