Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure multiple callbacks do not bleed wrong plot state #1034

Merged
merged 18 commits into from
Jan 7, 2017

Conversation

philippjfr
Copy link
Member

@philippjfr philippjfr commented Jan 4, 2017

Currently there is a bug in the code that handles multiple callbacks being attached to one bokeh plot object. The issue arises when two streams are attached to one plot object, this can happen when one axis is linked between two plots. In these cases the RangeXY stream of one plot can end up with the y-range of another plot, which has a linked and therefore shared x-axis. The solution is to keep track of the IDs of the handles each stream is attached to, the frontend can send the matching ids along with the updated values and we can then check that only the appropriate data is actually forwarded to the stream.

@philippjfr philippjfr added tag: backend: bokeh type: bug Something isn't correct or isn't working labels Jan 4, 2017
@jbednar
Copy link
Member

jbednar commented Jan 4, 2017

Does this fix holoviz/datashader#250?

@philippjfr
Copy link
Member Author

philippjfr commented Jan 4, 2017

The immediate problem there is already fixed, this is a more subtle issue that occurs under certain circumstances. In the case that it came up it was in a scenario where there are two datashader plots with RangeXY callbacks, which have one linked axis and one independent axis, e.g. two different quantities over time.

@jbednar
Copy link
Member

jbednar commented Jan 4, 2017

Works well for me.

@philippjfr
Copy link
Member Author

Some other issue happening here where callback events are clobbering each other and it times out.

@philippjfr
Copy link
Member Author

Had to make some further changes to the ACK message protocol. Previously when multiple callbacks fired consecutively it could happen that some messages were getting throttled causing one particular comm to never receive an ACK message that would unblock it. Therefore callback event messages and the response may now contain the comms_target ID, ensuring that all comms are appropriately unblocked after an event has been sent.

@philippjfr
Copy link
Member Author

Need to add more tests and update existing tests.

@philippjfr
Copy link
Member Author

philippjfr commented Jan 5, 2017

@jlstevens I believe this is now ready for review. Both issues that I fixed here are pretty subtle and caused strange bugs. Here's the example I've been using to test the fix:

import holoviews as hv
import numpy as np
import pandas as pd
from holoviews.operation.datashader import datashade, aggregate, shade

hv.notebook_extension('bokeh')

%opts DynamicMap [width=900, tools=['xwheel_zoom']] Layout [shared_datasource=False]

n=100000
def f(n):
    return np.cumsum(np.random.randn(n))
df = pd.DataFrame({'t': range(n), 'y1': f(n), 'y2': f(n), 'y3' : .10 * f(n)} )

def p(y):
    return hv.Curve(df, kdims=['t'], vdims=[y])

layout = datashade(p('y1')) * datashade(p('y2')) + datashade(p('y3'))

layout.cols(1)

A rough explanation of the changes implemented here:

  1. Since a callback can be shared across multiple streams, and two callbacks can share multiple handles you can end up in situations where a bokeh JS callback sends data to two different callbacks. In the example above the x-range and y-range on the top plot share a JS callback, and the x-range and y-range on the bottom plot share a JS callback. Additionally since the x-axis is linked between the two plots, the x-range model is also shared between the plots. This means that the JS callback for one plot ends up sending the x-range and the y-range to both callbacks. However since the y-range is not shared this ends up supplying the wrong y-range to the callback on the other plot. The solution is to a) keep track of the IDs of the plot handles a particular stream is attached to and b) send the ID along with the value when an event is generated. Now the callback can check that the ID of the value its been given matches the IDs of the handles for a particular stream ensuring that only the correct values are sent to the stream.

  2. Every time a message is sent to Python, it has to send a message back. This acknowledgement (ACK) message is used to unblock a comm for future events. In the case above it may end up unblocking the same comm twice, leaving the second comm blocked. To get around this it now sends the unique ID of the comm in the message to Python, allowing the ACK message to send that ID back, ensuring that the correct comm can be unblocked.

@@ -10,7 +10,7 @@
from ..comms import JupyterCommJS


def attributes_js(attributes):
def attributes_js(attributes, handles):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to see the docstring updated with an example showing how handles is involved...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring still needs to be updated I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, still need to do that.

# Gather the ids of the plotting handles attached to this callback
# This allows checking that a stream is not given the state
# of a plotting handle it wasn't attached to
stream_handle_ids = defaultdict(list)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to turn this bit of code into a method (get_handle_ids?) and turn the comment into a docstring.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

for stream in self.streams:
stream.update(trigger=False, **msg)
ids = self.stream_handles[stream]
sanitized_msg = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this bit sounds like a 'message filter' (as opposed to sanitization) and could be its own method (with docstring). Something like message_filter(msg, ids)...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sounds good.

else:
handle.callback.code += code
else:
self.stream_handles.update(stream_handle_ids)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the contents of self.stream_handles can keep growing between set_customjs calls with no way to reset/clear it? Is this more state that hangs around when a visualization is removed (e.g a notebook cell is deleted?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it'll stick around but compared to all the plotting state I'm not worried about a few IDs.

@jlstevens
Copy link
Contributor

To summarize private discussion I've had directly with Philipp, I'll just say that I think the general strategy used in this PR looks sound - labelling comms with an id and keeping track of plotting handles.

I intend to use inline comments for suggestions specific to this PR and here are some of the more general (future) plans that we discussed:

  • There is code in callbacks that references jupyter comms specifically. This should be generalized to work with any comms type (e.g a websocket comms implementation).
  • I would suggest renaming 'comms_target' to 'comm_id'.
  • We want to document our message protocol and ideally encapsulate it all in one location (ie. in a class, maybe called Protocol that lives alongside the comm classes).
  • We should aim to generalize our message protocol. E.g always have a 'command' or 'msg_type' field. Maybe all messages could have 'msg_type' and 'data' fields that can be dispatched appropriately in the centralized message protocol class. Then we need something for 'ACK' and 'ERROR' messages...

@philippjfr
Copy link
Member Author

@jordansamuels I just updated the PR fixing the issue you reported when resetting, could you test it as well?

@philippjfr
Copy link
Member Author

@jlstevens Ready for review again.

@@ -152,39 +164,84 @@ def __init__(self, plot, streams, source, **params):
self.streams = streams
self.comm = self._comm_type(plot, on_msg=self.on_msg)
self.source = source
self.handle_ids = defaultdict(list)


def initialize(self):
plots = [self.plot]
if self.plot.subplots:
plots += list(self.plot.subplots.values())

Copy link
Contributor

@jlstevens jlstevens Jan 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the code below could be made into a small get_handles (or get_plot_handles) method. Alternatively, it could be a function in utils so the code below could be:

handle_ids = self._get_handle_ids(util.get_handles(plots))
self.handle_ids.update(handle_ids)

Which I think is clearer.

Copy link
Contributor

@jlstevens jlstevens Jan 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also assuming there is a good reason to use a dictionary update... i.e there may be other handle_ids on the Callback other than the ones currently returned by _get_handle_ids. If that is true, it is a bit strange to see this in an initialize method which sounds like it should only happen once.

Copy link
Member Author

@philippjfr philippjfr Jan 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the code below could be made into a small get_handles (or get_plot_handles) method.

Originally had one, didn't seem worth it though.

If that is true, it is a bit strange to see this in an initialize method which sounds like it should only happen once.

That's true, the handles get merged if multiple streams end up being attached to the same callback, in that case the set_customjs method below merges the callbacks:

for k, v in self.handle_ids.items():
    cb.handle_ids[k] += v

I could just assign in initialize instead of using the dictionary update, but I wanted to indicate that the handle_ids may be modified by another callback.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally had one, didn't seem worth it though.

I agree it is a small bit of code and I was also wondering if it was worth it. On balance, I felt making the intent clearer for this block of code was more valuable.

Copy link
Contributor

@jlstevens jlstevens Jan 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a bit confused about self.handle_ids - I see it mentioned in four place:

  1. When it is declared.
  2. The bit in initialize above.
  3. What looks like an a dictionary access in on_msg.
  4. The iteration over k, v you just mentioned.

In none of these places do I see self.handle_ids being changed after it is initialized. If you mean that self.handle_ids is being changed somewhere else in the code, then I wouldn't worry about using update here.

In that case, I would also want to look at where the merging of self.handle_ids is happening to consider whether that should become part of the API of Callback.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted where it's being changed above, in set_customjs a callback will merge its own handle_ids with another callback's handle_ids if that callback has already been attached to the plot handle (since you can't attach multiple callbacks on one handle).

for k, v in self.handle_ids.items():
    cb.handle_ids[k] += v

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I was looking for self.handle_ids but cb here is another Callback instance. In that case, how about merging to a different place e.g cb.joint_handle_ids? If it is too complicated, then I don't mind leaving it as it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No point keeping both around imo.

found = []
for plot in plots:
for handle in self.handles:
if handle not in plot.handles or handle in found:
continue
self.set_customjs(plot.handles[handle])
self.set_customjs(plot.handles[handle], handles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand right, found is a list of handles that are in some sense 'unique'. So maybe it could be something like:

unique_handles = self.unique_handles(plots)
for unique_handle in unique_handles:
   self.set_customjs(plot.handles[handle], handles)

Now you might want to call unique_handles something like filtered_handles instead and this variable would then replace found.

return msg


def set_customjs(self, handle):
def _get_handle_ids(self, handles):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like a name that conveys the filtering operation that is also happening here. Maybe _get_attached_ids or _get_attached_handle_ids?

data['x_range'] = (msg['x0'], msg['x1'])
if 'y0' in msg and 'y1' in msg:
data['y_range'] = (msg['y0'], msg['y1'])
return data


class RangeXCallback(Callback):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the new bits of code below seems to have the form:

if predicate(msg):
   return dictionary_from_msg(msg)
else:
  return {}

If you agree this is a general pattern, maybe we can just have an applicable predicate method with the baseclass checking the predicate value. E.g

class RangeXCallback(Callback):
      handles = ['x_range']

     def applicable(msg):
          return 'x0' in msg and 'x1' in msg
  
      def _process_msg(self, msg):
         return {'x_range': (msg['x0'], msg['x1'])}

And of course applicable can return True by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are multiple predicates, such as in RangeXY? I'd prefer not to complicate this for now, although I do agree something like this is worth considering.

Copy link
Contributor

@jlstevens jlstevens Jan 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RangeXY is really the union of RangeX and RangeY. We might want to consider generalizing this idea of a union so we can build things like RangeXY out of the component pieces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you agree with this suggestion, maybe it should be made into a new issue (feature request)? I don't think it would be hard to implement later.

@philippjfr
Copy link
Member Author

Okay, implemented most of your new suggestions. Need to update the attribute_js docstring and will try to add a few unit tests for the various Callback methods.

@philippjfr
Copy link
Member Author

@jlstevens Added a unit test and revised the docstring. We can settle what to do about the callback _process_msg implementations in another issue as you suggest. Unless you object I think this is ready to merge once tests pass.

@jlstevens
Copy link
Contributor

jlstevens commented Jan 7, 2017

I'm happy to merge now although looking at the things we discussed, there might be one thing that could still be addressed in this PR:

  • I would suggest renaming 'comms_target' to 'comm_id'.

Here are the other suggestions made (that we should turn into issues):

  • As you agreed, we should file an issue to define Callbacks with an applicable predicate and introducing unions (so PositionXY would be the union of PositionX and PositionY).
  • There is code in callbacks that references jupyter comms specifically. This should be generalized to work with any comms type (e.g a websocket comms implementation).
  • We want to document our message protocol and ideally encapsulate it all in one location (ie. in a class, maybe called Protocol that lives alongside the comm classes).
  • We should aim to generalize our message protocol. E.g always have a 'command' or 'msg_type' field. Maybe all messages could have 'msg_type' and 'data' fields that can be dispatched appropriately in the centralized message protocol class. Then we need something for 'ACK' and 'ERROR' messages...

@philippjfr philippjfr force-pushed the duplicate_callback_fix branch 2 times, most recently from 17ad6ab to 0dedc3b Compare January 7, 2017 14:38
@jlstevens
Copy link
Contributor

Thanks for doing the renaming (which looked a bit tricky)!

Merging.

@jlstevens jlstevens merged commit 7806464 into master Jan 7, 2017
@philippjfr philippjfr deleted the duplicate_callback_fix branch January 7, 2017 15:00
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
tag: backend: bokeh type: bug Something isn't correct or isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants