Creating Time-series Line Chart with high cardinality always times out #23464

Usiel · 2023-03-23T05:14:25Z

Creating Time-series Line Chart with high cardinality always times out due to inefficiencies in the pandas_postprocessing.pivot module.

The example below may seem slightly constructed, but I think it's likely that a Superset user will come across this at some point: They want to build a time-series chart and inadvertently create high cardinality groupings without setting a series limit. Currently, they will be confronted with a timeout and be non the wiser. With a minor optimization we can instead show them the data they requested and they can make a decision from there.

A better solution than a simple performance fix would be imo, if Superset would make a decision and apply a series limit for the user, but I figure that would be more of a feature request :)

How to reproduce the bug

Explore the example dataset cleaned_sales_data
Add multiple dimensions (e.g. contact_first_name, contact_last_name, phone)
Add any metric
Select the Time-series Line Chart
Click on "Update Chart"

Expected results

Chart should load within a few seconds

Actual results

Chart will time out or take a very long time

Screenshots

Environment

(please complete the following information):

superset version: 2.0.1, 2.1.0rc3 and latest master@7ef06b0a6
python version: 3.8.13

Checklist

Make sure to follow these steps before submitting your issue - thank you!

I have reproduced the issue with at least the latest released version of superset.
I have checked the issue tracker for the same issue and I haven't found one similar.

Additional context

I will open a PR shortly and link to this issue.

The text was updated successfully, but these errors were encountered:

Executing a pivot for with `drop_missing_columns=False` and lots of resulting columns can increase the postprocessing time by seconds or even minutes for large datasets. The main culprit is `df.drop(...)` operation in the for loop. We can refactor this slightly, without any change to the results, and push down the postprocessing time to seconds instead of minutes for large datasets (millions of columns). Fixes apache#23464

rusackas · 2024-02-23T18:13:31Z

This is likely fixed by now, and is pretty out of date if not. If people are still encountering this in current versions (3.x) please open a new Issue or a PR to address the problem.

Usiel added the #bug Bug report label Mar 23, 2023

Usiel mentioned this issue Mar 23, 2023

perf(postprocessing): improve pivot postprocessing operation #23465

Merged

1 task

rusackas closed this as completed Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating Time-series Line Chart with high cardinality always times out #23464

Creating Time-series Line Chart with high cardinality always times out #23464

Usiel commented Mar 23, 2023

rusackas commented Feb 23, 2024

Creating Time-series Line Chart with high cardinality always times out #23464

Creating Time-series Line Chart with high cardinality always times out #23464

Comments

Usiel commented Mar 23, 2023

How to reproduce the bug

Expected results

Actual results

Screenshots

Environment

Checklist

Additional context

rusackas commented Feb 23, 2024