-
-
Notifications
You must be signed in to change notification settings - Fork 404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pipeline property to track data lineage #3967
Conversation
This is pretty much exactly what I expected when we discussed this so I'm very happy to see it seems to have worked. The |
Yeah, thanks for working through the design with me! In terms of points.apply(hv.operation.histogram).pipeline
But that does remind me that I should add some tests for And, there might be a hole here if the thing |
No, I don't think this is a problem. We only need to update the pipeline if the function passed to apply returns a points.apply(lambda p: p.select(x=(0, None))).pipeline
Hmm, there are also the |
I frequently write functions that take an object compute something from it and then repack a new Dataset, e.g. here's an apply function I just wrote for a dashboard I'm writing:
|
Ok, yeah. That's a good point regarding the apply function constructing a brand new object. So this will need to be captured separately from the I'll work on
|
In 702e531 I added a new meta class to support pipelines in points.apply(
lambda p: hv.Points(p.select(x=(0, None)).data)
).redim.label(x="The X Dim").opts(color='green').pipeline
|
Jon, |
Hi @johnzzzzzzz, Have you seen #3951? This is work towards creating a workflow to automatically link selections between HoloViews elements (including those produced by hvplot). The next iteration of that PR is going to build on top of this pipeline work. |
I'm excited too. Would it be possible for |
I don't think so, unless the thing returned by I'm definitely open to renaming |
That's what I suspected. It would be easy to have something that prints like a list while being callable, but I agree that it's nicer to have it simply be a list when it's returned as a value. I don't have any suggestions for a better name, then. |
This removes the `execute_pipeline` method
in the presence of exceptions
Looks good! I don't actually much like the return processed if self.p.group is None else processed.clone(group=self.p.group) |
Thanks! It is nice for pipeline to be a standard
That sounds good to me. I made the change in the I think this PR is in pretty good shape now. Thanks for taking a look, and let me know if anything else comes to mind that we should do before merging. |
Happy to see this merged. I'll give @jlstevens a chance to review though. |
Okay, since he's on PTO for the foreseeable future I'm going to go ahead and merge. |
Thanks! |
This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Overview
This PR adds a new
pipeline
property to theDataset
class. This property holds a list of(function, args, kwargs)
tuples that represent the sequence of operations needed to transform theDataset
stored in thedataset
property into an element equal to current element.It also adds a new
execute_pipeline
method that can evaluate this sequence of functions on an input dataset. This makes it possible to reproduce the same sequence of operations on a new Dataset.Relationship to other PRs
dataset property
The
dataset
property was added to theLabelledData
class in #3919. This PR moves thedataset
property down to theDataset
class, so there is no longer adataset
property on, for example, theLayout
class. This reduces the scope of wheredataset
andpipeline
need to be correct / consistent.Histogram _operation_kwargs
This PR removes all special cases associated with Histogram elements. So the
Histogram._operation_kwargs
property added in #3921 has been removed.select all dims
In #3924, the
select
method is updated to consider all dimensions in theDataset
stored in the element'sdataset
property. This PR does not do this, and instead provides theexecute_pipeline
method as a more powerful alternative to acheiving the same goal. See examples below.link_selections
This PR will become a more powerful foundation for the automatic linked selection support being added in #3951
Example 1: Points
Create a sample 3-dimensional dataset. x and y are independently drawn from the standard normal distribution and r is calculated to be the radius of each point from the origin.
Display the pipeline for the new
points
elementNext, create a new points element by running
execute_pipline
on a subset of the dataset stored inpoints.dataset
. Note that it would not be possible to compute this subset usingpoints.select
directly because it involves ther
dimensions which is not a key or value dimension ofpoints
.Example 2: Datashade
Create an
RGB
image element frompoints
using thedatashade
anddynspread
operations withdynamic=False
.Display the pipeline for
points_rgb
Next, compute a new
RGB
element by calling theexecute_pipeline
method with a subset of the original dataset. Note that this is a selection that was not possible using the approach in #3924.Example 3: Histogram
Next, repeat the same process using a
Histogram
element created frompoints
.Display pipeline
Create new
Histogram
element withexecute_pipeline
Example 4: Custom aggregation
In this example, create a
Bars
element from the result of aggregating an originalDataset
.pipeline
Create a new
Bars
element on a subset of the original dataset usingexecute_pipeline