-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plots
: Add support for plotly as backend for render plots
#7
Comments
Haven't really thought on cross-product implications |
@shcheklein @tapadipti @Suor @rogermparent @mattseddon Interested to hear your thoughts on this. |
Looks like a very cool library, and certainly much easier to use than Vega! The defaults don't quite match our design, but it seems there's enough ways to hook into hover and click events to make up for that and the ability to set colors to lines is much friendlier than the equivalent I'd say if the goal is to encourage users to develop tools based on these plots, a more high-level alternative backend like plotly here could do the job. Worth noting it could add some complexity in features like |
Side note: I agree that This is a very good point to discuss. My original idea for the stages:
train:
cmd: python train.py
plots:
- prc.json:
cache: false
x: recall
y: precision
template: linear_plotly.json # Inferred
- roc.json:
cache: false
x: fpr
y: tpr
backend: plotly # Explicit The main motivation (besides giving users flexibility) was that However, when considering |
I think that would be the case, but it could be handled if there was some way to distinguish the plots with different schemas from each other. |
Supporting plotly was one of the ideas that came up during Studio ideas brainstorming sessions a while back. And it is one of the items we have in the roadmap for next year. I'm not sure how much work it would be in Studio to support this (may be @Suor would have some idea), but eventually it needs to be supported (at least as per the current roadmap / plan). |
As far as I see plotly template is not JSON but JavaScript code, which is problematic from security perspective. |
Not sure If I understand. At least on DVC side, For an example linear plot, we would have a JSON template with placeholders: vega{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": "<DVC_METRIC_DATA>"
},
"title": "<DVC_METRIC_TITLE>",
"width": 300,
"height": 300,
"layer": [
{
"encoding": {
"x": {
"field": "<DVC_METRIC_X>",
"type": "quantitative",
"title": "<DVC_METRIC_X_LABEL>"
},
"y": {
"field": "<DVC_METRIC_Y>",
"type": "quantitative",
"title": "<DVC_METRIC_Y_LABEL>",
"scale": {
"zero": false
}
},
"color": {
"field": "rev",
"type": "nominal"
}
},
"layer": [
{
"mark": "line"
},
{
"selection": {
"label": {
"type": "single",
"nearest": true,
"on": "mouseover",
"encodings": [
"x"
],
"empty": "none",
"clear": "mouseout"
}
},
"mark": "point",
"encoding": {
"opacity": {
"condition": {
"selection": "label",
"value": 1
},
"value": 0
}
}
}
]
},
{
"transform": [
{
"filter": {
"selection": "label"
}
}
],
"layer": [
{
"mark": {
"type": "rule",
"color": "gray"
},
"encoding": {
"x": {
"field": "<DVC_METRIC_X>",
"type": "quantitative"
}
}
},
{
"encoding": {
"text": {
"type": "quantitative",
"field": "<DVC_METRIC_Y>"
},
"x": {
"field": "<DVC_METRIC_X>",
"type": "quantitative"
},
"y": {
"field": "<DVC_METRIC_Y>",
"type": "quantitative"
}
},
"layer": [
{
"mark": {
"type": "text",
"align": "left",
"dx": 5,
"dy": -5
},
"encoding": {
"color": {
"type": "nominal",
"field": "rev"
}
}
}
]
}
]
}
]
}
plotly{
"legendgroup": "<DVC_METRIC_COLOR>",
"line": {
"dash": "solid"
},
"marker": {
"symbol": "circle"
},
"mode": "lines",
"name": "<DVC_METRIC_COLOR>",
"orientation": "v",
"showlegend": true,
"type": "scatter",
"xaxis": "<DVC_METRIC_X_LABEL>",
"yaxis": "<DVC_METRIC_Y_LABEL>",
"x": "<DVC_METRIC_X>",
"y": "<DVC_METRIC_Y>"
}
That would get filled with datapoints collected by DVC plots and embedded in an HTML div: vega <div id = "{id}">
<script type = "text/javascript">
var spec = {partial};
vegaEmbed('#{id}', spec);
</script>
</div> plotly <div id = "{id}">
<script type = "text/javascript">
var plotly_data = {partial};
Plotly.newPlot("{id}", plotly_data.data, plotly_data.layout);
</script>
</div> The So, |
Another reason plotly would be useful: it has built-in support for more ML/DS/analytical visualizations, like smoothing (see iterative/vscode-dvc#3837). |
I could start moving this forward in the CLI first and trying to get something working on Studio by myself (probably with som help) after |
If we can start trying to build towards it being a drop-in replacement for vega-lite, I think it would be nice. |
I've started looking at this from the VS Code perspective. I can see in #88 (not sure if that PR is active or not) that the current idea is for DVC to hold the required data in the same format for both Vega & Plotly. Might it make sense to change that approach given that For the extension it would be good to get the contents of { "data": {
"dvc.yaml::name": [
{
"type": "plotly",
"revisions": ["workspace"],
"layout": {LAYOUT},
"data": {DATA},
}]
}} That way we'll be able to update the LMK what you think. If we can agree on the approach I have the capacity to make contributions here and in DVC to get this moving. |
I can see this would be a more involved change because Studio reaches directly into DVC and calls |
The draft P.R.'s motivation was to keep the "status quo" of Vega implementation, introducing Plotly in a transparent way for DVC.
For the
I will do a minor update to the dvc-render P.R. , as it is currently missing the layout part. Then we can discuss how to handle the We also need to decide how/when to enable Plotly. Options from the top of my mind: A) Have a feature flag in DVC like |
I am going to catch up with @daavoo today about this (thanks for sending an invite David). The plan for me right now is to build a thin vertical slice along the lines of option B above. Ideally in the next two sprints, I'd like to be able to replace the smooth/linear/scatter templates with Plotly implementations (feels ambitious). Findings so far: I have been playing around with Plotly and the biggest difference with respect to Vega seems that the data and template are much less separate and get mangled together in order to create the desired output. As the "smooth" template seems to be the hardest of the three to generate I've been working on that. I've managed to adapt the below examples to generate a demo of what is possible in terms of "smoothing" (not worrying about style yet) https://plotly.com/javascript/sliders/#add-a-play-button-to-control-a-slider Screen.Recording.2023-09-06.at.12.30.11.pm.movCode for the demo
This does use the triangular moving average function mentioned previously (shown here) but that function is something that we have to implement on our own. We can also forgo the play button but it seems that in order to show different smoothed options we have to calculate all of the new y values ourselves and load each set of values into distinct Edit: Demo using ema as smoothing function - Screen.Recording.2023-09-06.at.3.49.03.pm.mov |
Today I've been looking at Vega. I have opened the above PR to add zoom/pan to plots in VS Code and have been able to come up with these tooltips for linear plots. PTAL and LMK what you think/if this changes anything. |
@mattseddon Is your point that we should reconsider plotly? |
I am really not sure. I think both Vega and Plotly have their own benefits and constraints. Let's chat about whether or not we still want to take this on when we meet this week. In the meantime, I am going to attempt to update the default templates to add zoom + pan and new tooltips. E.g. for smooth/linear, we will end up with: Screen.Recording.2023-09-11.at.9.33.00.am.movAs you can see from the above screen recording the template is not perfect as the tooltip contains I am also going to look further into the Studio/DVC code. Whatever we decide we need to start on removing parts of the legacy process. New proposed smooth template{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"data": {
"values": "<DVC_METRIC_DATA>"
},
"title": "<DVC_METRIC_TITLE>",
"width": "container",
"height": "container",
"params": [
{
"name": "smooth",
"value": 0.001,
"bind": {
"input": "range",
"min": 0.001,
"max": 1,
"<DVC_METRIC_X>": 0.001
}
}
],
"layer": [
{
"encoding": {
"y": {
"field": "<DVC_METRIC_Y>",
"type": "quantitative",
"title": "<DVC_METRIC_Y_LABEL>",
"scale": {
"zero": false
}
},
"color": {
"field": "rev",
"type": "nominal"
}
},
"layer": [
{ "mark": "line" },
{
"transform": [{ "filter": { "param": "hover", "empty": false } }],
"mark": "point"
}
],
"transform": [
{
"loess": "<DVC_METRIC_Y>",
"on": "<DVC_METRIC_X>",
"groupby": ["rev", "filename", "field", "filename::field"],
"bandwidth": {
"signal": "smooth"
}
}
]
},
{
"params": [{ "bind": "scales", "name": "grid", "select": "interval" }],
"mark": {
"type": "line",
"opacity": 0.2
},
"encoding": {
"x": {
"field": "<DVC_METRIC_X>",
"type": "quantitative",
"title": "<DVC_METRIC_X_LABEL>"
},
"y": {
"field": "<DVC_METRIC_Y>",
"type": "quantitative",
"title": "<DVC_METRIC_Y_LABEL>",
"scale": {
"zero": false
}
},
"color": {
"field": "rev",
"type": "nominal"
}
}
},
{
"mark": {
"type": "circle",
"size": 10,
"tooltip": {
"content": "encoding"
}
},
"encoding": {
"x": {
"aggregate": "max",
"field": "<DVC_METRIC_X>",
"type": "quantitative",
"title": "<DVC_METRIC_X_LABEL>"
},
"y": {
"aggregate": {
"argmax": "step"
},
"field": "<DVC_METRIC_Y>",
"type": "quantitative",
"title": "<DVC_METRIC_Y_LABEL>",
"scale": {
"zero": false
}
},
"color": {
"field": "rev",
"type": "nominal"
}
}
},
{
"transform": [
{
"calculate": "datum.rev + '::' + datum.filename + '::' + datum.field",
"as": "tooltip-group"
},
{
"pivot": "tooltip-group",
"value": "acc",
"groupby": ["<DVC_METRIC_X>"]
}
],
"mark": { "type": "rule", "tooltip": { "content": "data" } },
"encoding": {
"opacity": {
"condition": { "value": 0.3, "param": "hover", "empty": false },
"value": 0
}
},
"params": [
{
"name": "hover",
"select": {
"type": "point",
"fields": ["<DVC_METRIC_X>"],
"nearest": true,
"on": "mouseover",
"clear": "mouseout"
}
}
]
}
],
"encoding": {
"x": {
"field": "<DVC_METRIC_X>",
"type": "quantitative",
"title": "<DVC_METRIC_X_LABEL>"
}
}
} Demo VS CodeScreen.Recording.2023-09-11.at.1.35.07.pm.movIn order to implement this I think we need to consolidate the post-processing of data in the three products (due to the use of |
For anyone following this issue: This has been temporarily deprioritised whilst iterative/dvc#9940 is worked on. |
The main reasons to migrate to plotly would be:
A distant 3rd reason is UI improvements over vega lite, but I think we can already see that there will likely be as many drawbacks as advantages to the plotly UI. I think the first 2 points are strong enough that it's worth moving, but I don't think we have time to work towards the 2nd point now, and we have already put a ton of time into plots, so I would consider plotly a "nice to have" rather than an urgent priority. |
@shcheklein I think @dberenbaum summed it up well in the last comment. Plotly would not be a silver bullet and I don't think we can justify the effort for the benefits that we would get right now. |
plotly is a set of Open Source Graphing Libraries for building "I_nteractive charts and maps for Python, R, Julia, ggplot2, .NET, and MATLAB®_".
The "high level" concept is very similar to vega-lite (the current DVC plots backend): Both are javascript libraries based on
d3.js
using JSON to describe the plot "schema" and provide "bindings" to generate plots in different languages (altair would be the vega-lite Python equivalent). See a more detailed comparisonIt would be nice to extend DVC plots to support
plotly
as an alternative backend. The following is a non-exhaustive list of what I consider advantages (in DVC context) of adding support toplotly
:As a non exhaustive example, see differences between python bindings stats plotly / altair
Try plotly line chart / vega-lite line chart
This is especially relevant for some complex plots like iterative/dvc#4455 , where
plotly
provides many relevant interactions by default (i.e. reordering columns, selecting subsets) that seem quite complicated to add (if even possible) invega-lite
:plotly parallel coordinates / vega-lite parallel coordinates
After reviewing the internal
dvc.render
module and discussing it with @pared , it looks that it won't require too many changes on DVC to add support toplotly
.Edit by @dberenbaum to start a tasklist here of possible future plotly enhancements:
Tasks
The text was updated successfully, but these errors were encountered: