feat: mechanism for re-saving charts #25196

betodealmeida · 2023-09-06T04:29:39Z

SUMMARY

There's currently a bug in Superset where (1) if a chart is saved with a granularity_sqla, and (2) the GENERIC_CHART_AXIS feature is enabled after the chart creation, then the chart will break. This happens for some charts. Stripping granularity_sqla is not an option, because some charts still need it even when GENERIC_CHART_AXIS is enabled.

@Vitor-Avila found a workaround where saving the chart again will fix it. So in this PR I'm adding a new column to charts called force_save. When the column is true, the chart will save itself when loaded (in explore or a dashboard), and the column will be set to false. This way, charts can be forced to save by setting this column to true, which could be done conditionally (for certain viz types, eg).

Note that while this is a hack, it's a pattern we've used before. For alerts and reports we had to introduce logic to re-save the chart in order to set query_context. See #15824, #15865, and #15846.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Still in progress.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2023-09-06T04:30:23Z

Codecov Report

Merging #25196 (4416482) into master (d030544) will decrease coverage by 10.37%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           master   #25196       +/-   ##
===========================================
- Coverage   69.09%   58.73%   -10.37%     
===========================================
  Files        1904     1904               
  Lines       74598    74602        +4     
  Branches     8246     8246               
===========================================
- Hits        51544    43815     -7729     
- Misses      20917    28650     +7733     
  Partials     2137     2137

Flag	Coverage Δ
hive	`53.75% <100.00%> (+<0.01%)`	⬆️
mysql	`?`
postgres	`?`
presto	`53.65% <100.00%> (+<0.01%)`	⬆️
python	`61.45% <100.00%> (-21.49%)`	⬇️
sqlite	`?`
unit	`55.27% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed	Coverage Δ
superset-frontend/src/dashboard/actions/hydrate.js	`2.04% <ø> (ø)`
.../src/dashboard/components/gridComponents/Chart.jsx	`59.29% <ø> (ø)`
...perset-frontend/src/dashboard/containers/Chart.jsx	`75.00% <ø> (ø)`
...t-frontend/src/explore/actions/saveModalActions.js	`95.58% <ø> (ø)`
...src/explore/components/ExploreChartPanel/index.jsx	`77.63% <ø> (ø)`
superset/charts/api.py	`50.32% <ø> (-37.50%)`	⬇️
superset/charts/schemas.py	`99.38% <100.00%> (+<0.01%)`	⬆️
superset/explore/schemas.py	`100.00% <100.00%> (ø)`
superset/models/slice.py	`59.11% <100.00%> (-28.52%)`	⬇️

... and 292 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

zhaoyongjie · 2023-09-06T08:53:29Z

@betodealmeida
This issue might introduce from PR, basically, we don't need transformation in the backend, it should transform in the frontend.

The other hand, I think we should try to avoid hacks too much in Superset, hacks have too much for now!

michael-s-molina · 2023-09-06T14:12:19Z

Thank you for the PR @betodealmeida. We have been discussing this internally at Airbnb and it's clear that these problems happen because we are missing schema versioning and validation for these types of fields. We have many bugs that were reported because these schemas change every time we introduce a new feature, essentially creating a new version of it, but we don't correctly support/migrate old versions. The fact that many of these schema transformations are on the frontend is really problematic because we end up creating columns in the database to store those transformations so they can be used by Python code when processing API requests as you can see by #15824, #15865, and #15846. @john-bodley Here's the motivation for the params and query_context duplication that haunts us 😄

Another critical problem of that approach is that Superset assets can be consumed through APIs, without a frontend, and in that case, changes in feature flags don't reflect in the schemas returned by these APIs. We have 20 feature flags that will be removed in 4.0 (including GENERIC_CHART_AXES) which will ease the problem but we still need to discuss schema versioning for these types of fields.

I understand the motivation for this PR but before we go down this road, is it possible to fix the issue using other means? Can you explain what breaks (video, stacktrace) and which chart types are affected?

betodealmeida · 2023-09-06T15:20:27Z

The other hand, I think we should try to avoid hacks too much in Superset, hacks have too much for now!

I totally agree, but keep in mind that there are hacks and hacks. This PR is a hack in a sense that "we shouldn't have to do this", not in the sense of "this is really brittle and risky". As is, this PR does nothing, and the only thing it does is re-save the chart when the admin sets the column to true.

betodealmeida · 2023-09-06T15:28:53Z

Thank you for the PR @betodealmeida. We have been discussing this internally at Airbnb and it's clear that these problems happen because we are missing schema versioning and validation for these types of fields. We have many bugs that were reported because these schemas change every time we introduce a new feature, essentially creating a new version of it, but we don't correctly support/migrate old versions. The fact that many of these schema transformations are on the frontend is really problematic because we end up creating columns in the database to store those transformations so they can be used by Python code when processing API requests as you can see by #15824, #15865, and #15846. @john-bodley Here's the motivation for the params and query_context duplication that haunts us 😄

I totally agree, we need versioning, better separation of concerns between backend/frontend and query/presentation, as well as a simpler canonical data model for charts (which would also make it easier to develop new charts).

Another critical problem of that approach is that Superset assets can be consumed through APIs, without a frontend, and in that case, changes in feature flags don't reflect in the schemas returned by these APIs. We have 20 feature flags that will be removed in 4.0 (including GENERIC_CHART_AXES) which will ease the problem but we still need to discuss schema versioning for these types of fields.

I agree here too, I've mentioned and been wanting to write a SIP proposing us to adopt blueprints, not only so it's easier to have multiple versions of an API at the same time, but we could even have blueprints for feature flags as well.

I understand the motivation for this PR but before we go down this road, is it possible to fix the issue using other means? Can you explain what breaks (video, stacktrace) and which chart types are affected?

You and @geido probably know this better than me, but my understanding of the problem is that some legacy charts still require granularity_sqla to be present when GENERIC_CHART_AXES is present; other visualizations don't. We don't know which charts are in which group, @geido suggested trying to figure out programmatically but wasn't able to do that.

From the errors I saw, there's no stacktrace, just a subtle silent error where the chart query is generated incorrectly. To the user, apparently no error happens, but the data presented might be fundamentally wrong. I'm still running tests to see which charts are affected and if this PR can fix them.

And to be clear, I don't think this is a road, I'm not proposing a new way of doing things. This is just a workaround allowing admins to fix some (or all) charts. This PR is a no-op unless the admin sets the flag to true in one or more charts, so it should be a safe workaround. I don't think we should need this — which is why I called it a hack — and I hope we never need it again in the future.

Edit: regarding other fixes, the possible solutions we discussed all seemed risky, specially compared to this approach. We'd have to either modify chart metadata, or have the backend filter it out depending on the visualization type.

villebro · 2023-09-06T15:28:57Z

I think in the long run we should really consider implementing a simple Node service that can render query context on the fly from form data (we could place this logic fairly easily in the websocket server). This would remove the need for persisting query context, which we can't update without resaving the chart if the buildQuery logic changes.

betodealmeida · 2023-09-06T15:51:04Z

I think in the long run we should really consider implementing a simple Node service that can render query context on the fly from form data (we could place this logic fairly easily in the websocket server). This would remove the need for persisting query context, which we can't update without resaving the chart if the buildQuery logic changes.

I don't know if it's possible, but what I'd like instead is a simpler contract between visualizations and the backend, so that this isn't even needed.

michael-s-molina · 2023-09-06T17:30:08Z

Thank you for the additional context @betodealmeida.

I'm still running tests to see which charts are affected and if this PR can fix them.

Got it. Let me know the results.

I don't know if it's possible, but what I'd like instead is a simpler contract between visualizations and the backend, so that this isn't even needed.

Me too.

villebro · 2023-09-06T17:42:01Z

I think in the long run we should really consider implementing a simple Node service that can render query context on the fly from form data (we could place this logic fairly easily in the websocket server). This would remove the need for persisting query context, which we can't update without resaving the chart if the buildQuery logic changes.

I don't know if it's possible, but what I'd like instead is a simpler contract between visualizations and the backend, so that this isn't even needed.

@betodealmeida @michael-s-molina while there's definitely a lot of standardization that can be done, I think this will be very difficult to achieve. For example, the table chart triggers different queries if the pagination control is checked. Just looking at existing buildQueryContext hooks in the existing plugins shows that constructing chart data requests is non-trivial, and often very viz-specific. Therefore, moving this logic to the backend would be very difficult, as it would essentially require moving business logic into the backend. This was one of the main bottlenecks of viz.py, and one of the motivations for SIP-6: #5692. But if we can indeed figure out a backend architecture that can facilitate arbitrary viz plugin logic without custom buildQueryContext hooks, this would indeed be a great simplification

michael-s-molina · 2023-09-06T18:01:28Z

But if we can indeed figure out a backend architecture that can facilitate arbitrary viz plugin logic without custom buildQueryContext hooks, this would indeed be a great simplification

@villebro As always, I think there's a middle ground between visualization-specific code and business logic that needs to be supported by our APIs. I think this work is highly correlated with the new plugin architecture we were thinking about.

betodealmeida · 2023-09-06T18:06:34Z

I think this work is highly correlated with the new plugin architecture we were thinking about.

Can you share more about this? Can we have a bigger discussion?

betodealmeida · 2023-09-06T18:07:55Z

Also, @Vitor-Avila raised a good point: this won't work if the user viewing chart doesn't have permission to overwrite the chart, so they might be looking at incorrect data until an admin opens the chart. I'm going to work on a different approach.

michael-s-molina · 2023-09-06T18:11:00Z

Can you share more about this? Can we have a bigger discussion?

Sure. We can write a SIP or discussion thread about it after 3.0.

betodealmeida · 2023-09-06T18:16:54Z

@betodealmeida @michael-s-molina while there's definitely a lot of standardization that can be done, I think this will be very difficult to achieve. For example, the table chart triggers different queries if the pagination control is checked. Just looking at existing buildQueryContext hooks in the existing plugins shows that constructing chart data requests is non-trivial, and often very viz-specific. Therefore, moving this logic to the backend would be very difficult, as it would essentially require moving business logic into the backend. This was one of the main bottlenecks of viz.py, and one of the motivations for SIP-6: #5692. But if we can indeed figure out a backend architecture that can facilitate arbitrary viz plugin logic without custom buildQueryContext hooks, this would indeed be a great simplification

I definitely don't want to go back to the times of viz.py. Having the logic live only in JS makes it much easier to build and distribute visualization plugins, even if it introduces complexity (like sending an HTML report of a pivot table, where a lot of the transformations happen in JS).

But I think we can generalize the request. At the end of the day we need the following in order to get data (I might be forgetting something):

metrics (adhoc or already defined)
dimensions
filters (WHERE and HAVING)
limit, offset, ordering
post processing operations

And then each viz type has its own controls for presentation. Also, each viz should have a standard method for data transformation, something like export function processRows(rows: Row[]): Row[] {}, so that it can be called from outside the plugin (for alerts and reports, eg).

Edit: also, while we're there we should get rid of annotation layers and just have a chart support multiple datasources.

betodealmeida · 2023-09-06T18:18:17Z

Sure. We can write a SIP or discussion thread about it after 3.0.

I think a discussion or even a meetup might be better. I have a lot of ideas and I know other people also have them, it would be nice to brainstorm before we have a proposal.

villebro · 2023-09-06T18:20:07Z

@betodealmeida this would be great, I'd love to participate 👍

betodealmeida · 2023-09-06T18:40:14Z

Maybe when @michael-s-molina is in town we could do a committer meetup and discuss this and other similar topics? I'd love to talk about Flask blueprints and my vision for them. :)

michael-s-molina · 2023-09-06T18:54:06Z

Maybe when @michael-s-molina is in town we could do a committer meetup and discuss this and other similar topics? I'd love to talk about Flask blueprints and my vision for them. :)

I would love that!

betodealmeida added 4 commits September 5, 2023 16:43

Save outdated chart and reload

e2cfd6e

Add migration

dd87e66

Save outdated charts in dash

d9380f6

Rename column

4416482

betodealmeida requested a review from a team as a code owner September 6, 2023 04:29

pull-request-size bot added the size/L label Sep 6, 2023

betodealmeida requested a review from geido September 6, 2023 04:31

Fix tests and lint

aa43362

betodealmeida closed this Sep 6, 2023

betodealmeida mentioned this pull request Sep 6, 2023

fix: granularity_sqla and GENERIC_CHART_AXES #25213

Merged

9 tasks

mistercrunch deleted the sc_74253 branch March 26, 2024 18:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: mechanism for re-saving charts #25196

feat: mechanism for re-saving charts #25196

betodealmeida commented Sep 6, 2023

codecov bot commented Sep 6, 2023

zhaoyongjie commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

betodealmeida commented Sep 6, 2023 •

edited

Loading

villebro commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

villebro commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

betodealmeida commented Sep 6, 2023 •

edited

Loading

betodealmeida commented Sep 6, 2023

villebro commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

feat: mechanism for re-saving charts #25196

feat: mechanism for re-saving charts #25196

Conversation

betodealmeida commented Sep 6, 2023

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented Sep 6, 2023

Codecov Report

zhaoyongjie commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

betodealmeida commented Sep 6, 2023 • edited Loading

villebro commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

villebro commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

betodealmeida commented Sep 6, 2023 • edited Loading

betodealmeida commented Sep 6, 2023

villebro commented Sep 6, 2023

betodealmeida commented Sep 6, 2023

michael-s-molina commented Sep 6, 2023

betodealmeida commented Sep 6, 2023 •

edited

Loading

betodealmeida commented Sep 6, 2023 •

edited

Loading