Timeseries operations #1172

philippjfr · 2017-03-03T03:58:05Z

Some basic operations to work with timeseries, the outlier detection is very useful in combination with datashader because it lets you get hover information about the outliers. Needs tests and support for datetime types.

Until we sort out input/output specifications on operations, I'm handling unpacking of (Nd)Overlays with a .map call in the _process method.

I've also added a streams parameter to ElementOperation by default. It was already supported API and I've found myself wanting to attach custom streams (based on paramnb) to existing ElementOperations and not being able to. Here's the pattern I'm talking about, in the following code there are two streams one for the ranges shared amongst the data loading, datashading and decimate operations, and self which is a class defined for paramnb which controls the rolling window and sigma parameters of the smoothing and outlier detection operations:

stream = RangeXY()

# Define DynamicMap to load dataset based on stream parameters
dmap = hv.DynamicMap(self.load_data, kdims=[], streams=[self])

# Apply rolling mean with the window controlled by stream 
smoothed = rolling(dmap, streams=[self])

# Apply steps interpolation
smoothed = interpolate_curve(dmap)

# Datashade based on ranges
smoothed = datashade(smoothed, cmap=get_cmap('Set1'),
                     aggregator=ds.count_cat('Conf'),
                     width=675, height=300, streams=[stream])

# Compute outliers with window and sigma controlled by stream
outliers = rolling_outlier_std(dmap, streams=[self])

# Decimate outliers only based on current range
outliers = decimate(outliers, streams=[stream])

# Overlay dynamically smoothed, interpolated, datashaded timeseries
# with dynamically decimated outliers
smoothed * outliers

jlstevens · 2017-03-03T10:18:42Z

The functionality here is great!

@philippjfr @jbednar That said, this PR illustrates one worry I have with operations - the number of possible operations you can imagine implementing is unbounded. Once we split out the plotting backends, are operations really part of the core HoloViews?

I worry that even a set of general scientific set of operations could keep growing forever, nevermind any domain specific stuff. I am tempted to say that the core operations included with HoloViews should only be those used by the core codebase.

We've already started adding operations that depend on third party libraries (e.g datashader) and once we split out plotting backends, you'll need to install things like holoviews-bokeh and holoviews-mpl. Why not extend this concept to operations and have things like holoviews-datashader, holoviews-timeseries and holoviews-image-tools? We can always have a metapackage that includes everything e.g holoviews-all.

In short, I strongly feel we need a plan to keep what is in core HoloViews well bounded allowing us to split off more open ended things like plotting backends and operations elsewhere where they can grow more freely. This would make sense to consider for HoloViews 2.0.

philippjfr · 2017-03-03T12:45:07Z

That said, this PR illustrates one worry I have with operations - the number of possible operations you can imagine implementing is unbounded. Once we split out the plotting backends, are operations really part of the core HoloViews?

I'd argue they are unless they are clearly domain specific, which I don't think is the case here. I actually think there are other operations that are far more domain specific than these. If holoviews-core is about working with data then I don't mind shipping other operations with it. The main issue I have with splitting out all operations is that they are actually user level API, which means users will have to import them from various random places. Plotting backends don't suffer from the same issue because they can be loaded invisibly in the background without having the user execute awkward imports.

That said I'd be happy to discuss splitting them out if we have a good plan for them, and there is more than two or three operations per extension (like there would currently be if we split out holoviews-datashader or holoviews-timeseries).

philippjfr · 2017-03-03T13:27:13Z

Now working with the dask interface without having to cast the data so we should be able to apply these operations nicely to huge timeseries.

jlstevens · 2017-03-03T13:42:42Z

The main issue I have with splitting out all operations is that they are actually user level API, which means users will have to import them from various random places.

That's not necessarily true as you can imagine trying to import the various operation packages into the normal place (if available). Not saying that is a brilliant idea though!

jbednar · 2017-03-03T15:23:09Z

@philippjfr, this looks great; let's merge it!

@jlstevens , I don't think the operations we have now or can reasonably anticipate would add up to be worthy of the overhead of maintaining and installing a separate package, per operation or group of operations. Given the amount of code involved, the extra boilerplate and tracking required seems like it would make life worse, not better.

That said, it certainly is important to be careful about scope, and it is also important to think about future maintainability. In this context, is it possible for us to define a rigorous interface that operations use and staunchly defend that, rather than trying to handle it at the packaging level? The important thing is to avoid having a combinatorial spaghetti mess, where we when we change one thing in the core we have to update lots of operations that do things that only one of us knows anything about. Seems like this could be avoided by focusing on having a clear interface for operations to use, so that as long as that doesn't change, other changes to the core don't make us have to revisit each of the disparate operations supported.

Overall, I think operations really exploit the power of HoloViews to provide high-level analysis easily and safely, so I do think we should make them maintainable and feasible and something we can encourage rather than something to fear.

jlstevens · 2017-03-03T15:38:57Z

I'm still not sure how we will define the scope of what operations do or don't belong into HoloViews.

What we can do is establish a high bar for the things that do belong. In particular, I would like to see some decent documentation/notebooks describing all the operations we have, justifying their cross-domain generality and giving examples.

jbednar · 2017-03-03T16:35:55Z

Ok, I think that should be simple enough. I don't think there is any issue with current operations not being suitable across domains; these are all easily applicable across domains as far as I can remember.

jbednar · 2017-03-03T16:36:34Z

holoviews/operation/timeseries.py

+
+class rolling_outlier_std(ElementOperation):
+    """
+    Detect outliers for using the standard devitation within a rolling window.


Detect outliers using the standard deviation

philippjfr · 2017-03-04T12:47:29Z

Added some tests now, so this should be ready to merge. I'll also put together a notebook covering these operations, which I'll add to contrib.

philippjfr · 2017-03-04T12:57:25Z

That said, it certainly is important to be careful about scope, and it is also important to think about future maintainability. In this context, is it possible for us to define a rigorous interface that operations use and staunchly defend that, rather than trying to handle it at the packaging level? The important thing is to avoid having a combinatorial spaghetti mess, where we when we change one thing in the core we have to update lots of operations that do things that only one of us knows anything about. Seems like this could be avoided by focusing on having a clear interface for operations to use, so that as long as that doesn't change, other changes to the core don't make us have to revisit each of the disparate operations supported.

Overall operations have changed very little since their first inception, particularly ElementOperations. One thing that's bugging me and others (see #1107) is that by default they do not unpack NdOverlays so if you want that to be supported you need to add some special handling in the _process method like I have done here. Having discussed this with Jean-Luc, we both agree this will have to be handled by allowing operations to declare both their input and output types. It's still unclear to both of us how best to specify nested type declarations if at all, so that's an issue that still needs discussing. In the meantime I think it's fine to resort to the _process/_apply pattern used here as that's always going to be compatible. So really what I'm saying is, even if retrofit type declarations on operations in the future, old operations without them should continue working.

jbednar · 2017-03-04T16:47:39Z

Well, we wanted to add output types to param for other reasons anyway...

jlstevens · 2017-03-05T18:30:57Z

holoviews/operation/timeseries.py

+import param
+
+from ..core import ElementOperation, Element
+from ..core.util import pd


I think it might be better to make this explicit i.e show that pd is None if pandas isn't available.

jlstevens · 2017-03-05T18:31:56Z

holoviews/operation/timeseries.py

+        Minimum sigma before a value is considered an outlier.""")
+
+    def _apply(self, element, key=None):
+        sigma, window = self.p.sigma, self.p.rolling_window


I think it might be worth checking if pd is None here and raising a suitable error (cannot proceed without pandas).

jlstevens · 2017-03-05T18:35:22Z

holoviews/operation/element.py

@@ -575,7 +575,7 @@ class decimate(ElementOperation):
       The x_range as a tuple of min and max y-value. Auto-ranges
       if set to None.""")

-    def _process(self, element, key=None):
+    def _apply(self, element, key=None):


Not sure I like the name _apply. Even though _process is supposed to process elements already, how about _process_element which processes elements, excluding Overlays/NdOverlays.

Unfortunately we already have a process_element method, which just does this:

def process_element(self, element, key, **params): """ The process_element method allows a single element to be operated on given an externally supplied key. """ self.p = param.ParamOverrides(self, params) return self._process(element, key)

Yeah, I just noticed it here. Why aren't we using this method then?

Because it still doesn't handle this case, and changing it would not be backward compatible. It's badly named is all.

How about _process_layer then? At least this is supposed to be temporary, but best pick a meaningful name in case it isn't as temporary as we would like...

Sure, I don't mind.

jlstevens · 2017-03-05T18:36:51Z

holoviews/core/operation.py

+    streams = param.List(default=[], doc="""
+        List of streams that are applied if dynamic=True, allowing
+        for dynamic interaction with the plot.""")
+


Seems fine but we do need to document this before before 1.7.

Yes, we'll have to add documentation about dynamic operations to our DynamicMap and streams sections.

jlstevens · 2017-03-06T01:47:59Z

Looks good and the tests are passing. Merging.

github-actions · 2024-10-26T01:45:05Z

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

philippjfr added the status: WIP label Mar 3, 2017

philippjfr force-pushed the timeseries_operations branch from 838012d to ea9c87a Compare March 3, 2017 13:50

jbednar reviewed Mar 3, 2017

View reviewed changes

philippjfr added type: feature A major new feature and removed status: WIP labels Mar 4, 2017

philippjfr force-pushed the timeseries_operations branch from b96a8df to 69c173b Compare March 4, 2017 12:49

jlstevens reviewed Mar 5, 2017

View reviewed changes

philippjfr added 8 commits March 5, 2017 22:27

Fixed decimate operation for (Nd)Overlays

3369010

Added rolling operations

59ba4d7

Add streams parameter to ElementOperation

de6b4ef

Allow interpolate_curve operation to work on (Nd)Overlays

65ec54f

Added resampling timeseries operation

9bf2911

Made timeseries operation efficient for Pandas and DaskInterface

37a3758

Expanded functionality and added tests for timeseries ops

2105ea4

Cleaned up operation.timeseries

1c64c4d

philippjfr force-pushed the timeseries_operations branch from 69c173b to 1c64c4d Compare March 5, 2017 23:03

Renamed operation _apply methods to _process_layer

d5faecb

philippjfr force-pushed the timeseries_operations branch from 3256ca8 to d5faecb Compare March 5, 2017 23:15

Fixed small bugs introduced in datashader

78258ad

jlstevens merged commit 2047dd3 into master Mar 6, 2017

jbednar deleted the timeseries_operations branch March 22, 2017 20:17

github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeseries operations #1172

Timeseries operations #1172

philippjfr commented Mar 3, 2017 •

edited

Loading

jlstevens commented Mar 3, 2017 •

edited

Loading

philippjfr commented Mar 3, 2017 •

edited

Loading

philippjfr commented Mar 3, 2017

jlstevens commented Mar 3, 2017

jbednar commented Mar 3, 2017

jlstevens commented Mar 3, 2017

jbednar commented Mar 3, 2017

jbednar Mar 3, 2017

philippjfr commented Mar 4, 2017

philippjfr commented Mar 4, 2017 •

edited

Loading

jbednar commented Mar 4, 2017

jlstevens Mar 5, 2017

jlstevens Mar 5, 2017

jlstevens Mar 5, 2017

philippjfr Mar 5, 2017

jlstevens Mar 5, 2017

philippjfr Mar 5, 2017

jlstevens Mar 5, 2017

philippjfr Mar 5, 2017

jlstevens Mar 5, 2017

philippjfr Mar 5, 2017

jlstevens commented Mar 6, 2017

github-actions bot commented Oct 26, 2024

Timeseries operations #1172

Timeseries operations #1172

Conversation

philippjfr commented Mar 3, 2017 • edited Loading

jlstevens commented Mar 3, 2017 • edited Loading

philippjfr commented Mar 3, 2017 • edited Loading

philippjfr commented Mar 3, 2017

jlstevens commented Mar 3, 2017

jbednar commented Mar 3, 2017

jlstevens commented Mar 3, 2017

jbednar commented Mar 3, 2017

Choose a reason for hiding this comment

philippjfr commented Mar 4, 2017

philippjfr commented Mar 4, 2017 • edited Loading

jbednar commented Mar 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlstevens commented Mar 6, 2017

github-actions bot commented Oct 26, 2024

philippjfr commented Mar 3, 2017 •

edited

Loading

jlstevens commented Mar 3, 2017 •

edited

Loading

philippjfr commented Mar 3, 2017 •

edited

Loading

philippjfr commented Mar 4, 2017 •

edited

Loading