Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new visualization for custom X-Y axes #4185

Closed
wants to merge 8 commits into from

Conversation

jeffreythewang
Copy link
Contributor

@jeffreythewang jeffreythewang commented Jan 9, 2018

Summary

  • Add a new visualization, XY Line Chart - a line chart with custom X-Y axes
  • Add functionality for hiding lines between data points, to create a scatterplot-like chart

Description

I added a new visualization for plotting data points on a line chart which allows the user to specify any column or metric of the query as the X-Y axis. It is based entirely on the query fields of TableViz. This way, you can easily switch between the Table View and the XY Line Chart, since they have similar form data fields. I made no changes to the query objects, and primarily made changes to the resultant pandas DataFrame. Some features include:

  • Ability to specify multiple columns or metrics for the Y axis
  • Ability to slice by groups of columns or metrics (for multiple series)

Examples

Group-By Query (with multiple aggregations plotted on the Y-Axis)

groupby_query

Generic Column Selection (with slicing)

slice_by_date

Scatter Plot

scatter_plot

Related Issues

#3769
#3184

Copy link
Contributor

@fabianmenges fabianmenges left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fabianmenges
Copy link
Contributor

@mistercrunch @williaster any thoughts?

superset/viz.py Outdated
viz_type = 'line_xy'
verbose_name = _('XY Line Chart')
sort_series = False
is_timeseries = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be False ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is primarily used as a small hack to implement limiting on the groupBy groupings, using the limit as the timeseries_limit in the query objects. The limit is applied on the subquery in the case of sqla, and the first phase in the case of the druid query.

The reason why limiting should be applied is so that, in some cases, the visualization doesn't come out as a large number of lines, which would crash the browser.

I realize it probably makes more sense to limit the result set after running the query, limiting on the sliceBy field, but I'm open to suggestions here. We can also forego implementing limiting for now, and encourage users to apply proper filters on their queries. Otherwise I'll leave it the way it is now and include a comment in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note - I decided to go with the latter solution and forego implementing limiting.

Copy link
Member

@mistercrunch mistercrunch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have a conversation as to whether supporting "NOT GROUPED BY" for this specific visualization is necessary. I'd rather keep things simple here and only allow "group by" on the X axis

description: t('Columns or metrics to display'),
mapStateToProps: (state) => {
let choices = [];
if (state.controls && state.datasource) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the exact same function as in columns_and_metrics_x, let's make this DRY

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of being able to plot non grouped-by measures. In some cases I feel that this is a missing option (e.g., box plot).

The non-grouped xy-plot would be even more useful if there were an option to just plot markers without any lines. In other words, a plain scatter plot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I must have mussed that the scatter plot is implemented here, already.
So, I would highly appreciate if the non-grouped code could just stay...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand your NOT GROUPED BY use case, is it for optimization purposes in cases where the data is already aggregated? Note that you can still make columns that are used in metrics "groupbable" and group by them. Also note that the cost of grouping something that's already grouped should be fairly cheap.

I think having the two interfaces in one makes things confusing. For the case of the Table viz it was very necessary to allow NOT GROUPED BY, and I'm trying to understand what use case requires that in this context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We, for example, need it to show raw values of our sensor products. So each data point represents one serial number.
You might say that grouping by SN should just do the job. However, grouping by, e.g., the job number, is then more useful.

In other cases there exists real xy measurement data for a product, like e.g. a transfer characteristic, and this again is not useful to be aggregated but has to be investigated for each SN separately.

Limiting the amount of data may of course be challenging then, but this is the user's responsibility in my eyes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mistercrunch The 'NOT GROUPED BY' use case is the same one why it is available on the table visualization. Fundamentally, this general X-Y chart lets you select any column of a table-visualization as the X axis or the Y axis. In short, if you have two numeric columns in a table you should be able to display the data as a line chart regardless of what the 'query' looked like.
The reason for the 'NOT GROUPED BY' section to be included in the table viz is to be able to visualize (list) non-aggregated data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 not grouped by

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about as an alternative to the GROUPED BY / NOT GROUPED BY, if we allowed for a mix of metrics and dimensions in the dropdowns? If no metrics are selected, then we don't aggregate. Perhaps that's the way that the table view should work as well.

We're planning a fair amount of work to have a more column-centric approach in the explore view and supporting backwards compatibility on this new chart type may make that harder.

],
},
{
label: t('NOT GROUPED BY'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling NOT GROUPED BY here adds a lot of complexity to this visualization, why is it necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note - I included NOT GROUPED BY to address the feature @rumbin mentioned above.

@raghavmishra1202
Copy link

Is there anyway to make months (Jan-Dec) as x axis.
I tried to add a new column where numbers 1-12 correlated with months. It worked in terms of csv file but when I tried to visualize it, the order on the graph became either alphabetical or if i included the numbers + string as so 1, Jan 10 Oct 11 Nov 12 Dec 2 Feb 3 March etc.
Any help would be appreciated!

@jeffreythewang
Copy link
Contributor Author

@raghavmishra1202 Correct me if I'm wrong, but it sounds like what you are trying to visualize is best done with a time series chart with month granularity.

If instead, you are trying to plot specific month-integer values (1-12, and not in a timestamp-like field) on the X-Axis, and want them displayed from a mapping of numbers to month-string, that seems like a display-specific change that can be implemented in in the X Axis Format section.

If we want to include similar functionality in the XY Line Chart (granularities for things other than dates), we would have to support bucketing values into ranges, which is not included in this change set.

@raghavmishra1202
Copy link

@jeffreythewang thanks for your reply. Unfortunately that doesn't work in my case. the data i have is organized by basically months in numeric form in one column and data in the other. If i do the time series chart i get some errors. If i do a normal plot it shows the months as 1-12 on the x axis with corresponding data. I basically group the string version of month with numeric but now the organization of the axis gets messed up. it's weird cause the csv shows the correct order
month string_month data
1 jan 5
2 feb 3
3 mar 9

any suggestions? like i mentioned above either it arranges x axis as
1, Jan 10, Oct 11 Nov etc. --> if i can get this fixed that would be perf
or with just strings it does alphabetical

@raghavmishra1202
Copy link

Also is there a way to download all the dashboard data into a single csv/zip folder?

@jeffreythewang
Copy link
Contributor Author

@raghavmishra1202 The way that the XY Line Chart works under the hood is that it orders the result set of the query based on whichever column/metric is selected to be plotted on the X-axis. This way, the line(s) that is/are plotted will have continuity in the series. In other words, the dots are connected in the order that you expect to be connected. For the best visualization, quantitative columns/metrics should be plotted on the X or Y axis, and qualitative fields should go in the sliceBy part of the visualization.

For your specific use case, in my opinion, there are a few ways to go about it:

  • Encode your month field into a DATETIME-like column, mark it as temporal in the column edit view, and then plot it as a timeseries visualization with month granularity.
  • Plot the month integer on the X-axis, and label the X-axis as "Month (numerical)" or something descriptive.
  • Extend the X Axis Format to format numerical month-integers as string month-values.

Another solution is to allow users to specify their own ordering for the result set, but I'm not sure how useful or intuitive that would be. I'm open to suggestions here.

Also re: downloading dashboard data into a single csv/zip - As far as I know, it does not seem like it, and you'll have to export individual charts as CSV. It can be filed as a feature request though.

Copy link
Contributor Author

@jeffreythewang jeffreythewang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, there is an issue with rendering the XY Line Chart when a temporal column is not present, so please do not merge right now. I will update the PR when I push the fix.

UPDATE: This issue is now fixed.

@williaster
Copy link
Contributor

williaster commented Feb 4, 2018

curious why this needs to be a new visualization type vs just exposing more customization options (eg not grouped by, no line) in the existing time series line chart, thus making it more general and not a time series chart.

if this is a superset of functionality, why not just deprecate/migrate the line series instead of cramming more stuff in? this is more consistent with other charting frameworks where you just choose x and y and there's no assumption that x is a specifc type (eg time)

@fabianmenges
Copy link
Contributor

@williaster there are a few reasons.

  • Since we want to be able to display data that is not necessarily a time series or aggregated, the controls of the Table View are much more appropriate.
  • To make the new "X-Y Line Chart" viz type a true 'superset' of the "Time Series - Line Chart" would require a lot of work and major refactoring. How does the 'Advance Analytics' from a "Time Series - Line Chart" work for non time series data, rolling, resampling, time shift, etc... How do Annotations work for non Time Series data?
  • The controls of the "X-Y Line Chart" are actually more complex than the Table or "Time Series - Line Chart" controls. You have do define your query ("GROUP BY" or "NOT GROUPED BY") and independently of that use the "Axis Options" section to pick which data should be displayed on the x-axis and which data should be displayed on the y-axis.

@williaster
Copy link
Contributor

Since we want to be able to display data that is not necessarily a time series or aggregated, the controls of the Table View are much more appropriate.

That makes sense in general, but doesn't seem like it makes the case for a new chart type to me. It seems like matching xy-chart + table visualization controls could be a good update because it is more general.

To make the new "X-Y Line Chart" viz type a true 'superset' of the "Time Series - Line Chart" would require a lot of work and major refactoring. How does the 'Advance Analytics' from a "Time Series - Line Chart" work for non time series data, rolling, resampling, time shift, etc... How do Annotations work for non Time Series data?

We know which columns are time columns right? It seems like we could just leverage that information to toggle on / off advanced analytics + annotations when the x-axis is time? there is precedent with showing/hiding controls in your annotations work and in the future annotations could be updated to support non-time xy-charts.

while this might be considered more work, it seems like less work than a future refactor of consolidating visualizations even further if we want to move to a model where we have a generic xy chart? (which again, is extremely prevalent in other plotting frameworks) perhaps I'm wrong but I really just think there's a lot of overlap here with the existing line chart and instead of bloating the # of visualizations it seems like this would just be a good generalization to make.

@mistercrunch
Copy link
Member

@williaster I was thinking that it should be different chart types as there are many controls that should become inactive when out of the time series mode. I'd say that one of the guidelines for creating new chart type as opposed to overloading an existing one is around the % of shared/relevant controls.

We currently don't have support for conditional form controls, meaning that selecting certain option in one control would affect the existence of other controls or control-sections. Until we have this I think it should be 2 different chart types.

I'm more troubled by the GROUPED BY vs NOT GROUPED BY options. When I added that to the Table viz early on in the project, I never thought it should be replicated to other chart types. The control panel section is bloated/confusing enough as it is without having 2 query-related sections. I think the short term solution here would be to have dropdowns with a mix of columns and metrics where we want to enable that. I think the longer term solution is to have a column-centric approach that allows (but doesn't always require) applying aggregate functions to the column. An example of that is that the clicks can be dragged to a Metrics section, and becomes clicks [sum] by default where an aggregation is required. You can then change it to other aggregations like clicks [avg] using a contextual popover. If the context doesn't require the column to be aggregated than it would allow turning off aggregations.

@williaster
Copy link
Contributor

@mistercrunch I agree with all that, esp about avoiding not grouped by and worsening the data selection process 🙌most users I've spoken with (technical and non-technical) don't understand that not grouped by means atomic rows, even @john-bodley and I.

long-term having selectors be aware of aggregation sounds gr8, and we can consolidate vis's then.

@sebastianbk
Copy link

Hey guys. What's the progress on this? Any chance it will make the 0.23 release? What love to work with this visualization! 😃

@mistercrunch
Copy link
Member

Won't make 0.23.0 for sure as the RC has been cut. Only bug fixes will make 0.23.0 from this point.

@BestOwed
Copy link

Hey guys. What's the progress on this? What love to work with this visualization for me too! 😃

@8849511588
Copy link

No need

@Zhihuzhe-ye
Copy link

@jeffreythewang hello , This branch 'Add new visualization for custom X-Y axes ' , Has it merged with the main branch ? The x-axis need not necessarily be a time/date, it could just be any number. Would be great to make that possible! Can I use it?

@weiqiLee
Copy link

Hello everybody, what's going on? this visualization is what I want too 😃

@htruongaff
Copy link

Any update on this? this seems to be one of the main pain points for many folks.

@jeffreythewang
Copy link
Contributor Author

@mistercrunch if we're committed to potentially merging this in, I can rebase

@williaster
Copy link
Contributor

as we refactor all charts to use a consistent API and move them to @superset-ui as chart plugins (which users can then pick/choose from at their will), it'd be helpful for us to not add additional charts which will then need migration.

@jeffreythewang if we help with documentation for how to create such a plugin / do this migration now, would you be open to making this a chart plugin that lives outside the superset repository?

cc @kristw @xtinec @conglei

@jeffreythewang
Copy link
Contributor Author

@williaster Yeah sounds good to me

@kristw kristw added the enhancement:request Enhancement request submitted by anyone from the community label Jan 23, 2019
@stale
Copy link

stale bot commented Apr 10, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the inactive Inactive for >= 30 days label Apr 10, 2019
@stale stale bot closed this Apr 17, 2019
@sumedhsakdeo
Copy link
Contributor

Curious if this made it into superset as a chart plugin @jeffreythewang
Can you please point me to the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement:request Enhancement request submitted by anyone from the community inactive Inactive for >= 30 days
Projects
None yet
Development

Successfully merging this pull request may close these issues.