Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored execute preprocessor to have a process_message function #905

Merged
merged 8 commits into from
Apr 21, 2019

Conversation

MSeal
Copy link
Contributor

@MSeal MSeal commented Oct 28, 2018

Built ontop of #982 which added additional requested testing. We can merge that PR first if it's desired.

Broke apart the large run_cell function to enable intercepting messages easier. This will enable deleting code from papermill where we have to reimplement the entire run_cell method.

@MSeal
Copy link
Contributor Author

MSeal commented Oct 29, 2018

Hmmm seeings this test fail against master so I don't think it's related to the PR

@maartenbreddels
Copy link
Collaborator

I see the same in #900

@mpacer
Copy link
Member

mpacer commented Oct 30, 2018

@Carreau I think you mentioned something about changing the error formatting in IPython a while back — shall I assume that the errors that are cropping up are related to a recent release of IPython?


outs.append(out)
cell.outputs.append(result)
Copy link
Member

@mpacer mpacer Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the idea of having an API that works primarily via side-effects — that's a difficult API to document and test for.

Can you think of a way that we could do this more as a pure function? E.g., could we adhere to the pattern for we use with preprocessors? I think that would involve returning and reassigning the modified cell object rather than returning the result itself while treating the cell.outputs as a location where we can append changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a look. Mostly I tried to keep the logic the same and not change the design, just break it into smaller parts.

Copy link
Member

@mpacer mpacer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, this is a really exciting idea as it should make it much easier for people to handle messages in custom ways, which I know we're already doing in papermill in order to log messages as they come in.

However, given how this is a piece of core functionality to the app, I'd prefer to see a new test for handling the new API in the standard case explicitly. Because it's such core functionality it would be a testing failure if we weren't already covering this.

Ideally, I think we should also add a new test specifically to add a mocked replacement method, in part because that would surface some of the details of how the API would work when it is overridden. I think the side-effects would show up as an anti-pattern in that case.

Overall, I'd like if if we could find a way to make the new API work with fewer side-effects and if we could make the logic of how messages are being handled a little more explicit.

if cell_index in cell_map:
cell_map[cell_index] = []
# Check for remaining messages we don't process
elif not (msg_type in ['execute_input', 'update_display_data'] or msg_type.startswith('comm')):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like really different logic from what we had before with msg_type.startsith('comm'); previously it was a continue.

Copy link
Contributor Author

@MSeal MSeal Oct 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually identical logically, by not returning None as a function it will effectively continue for the parent function loop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might have missed the not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. If the msg_type.startswith('comm') is true, then it does not fall into here and is not handled with another else, instead it will return the msg as the response (since that's what it defaults to and it is never reassigned).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally now if something is an update_display_data message, it will return something (specifically the content of the msg).

I'm starting to wonder if this all might be fixed if we default to returning None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was working fine post-rebase, but I changed this to a raise control flow pattern which I think is a lot easier to read.

try:
out = output_from_msg(msg)
# Assign output as our processed "result"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm reading this incorrectly, I think this may have in effect added nesting and only applied The cell_map logic in the case where it is not an execute_input or a comm? if that's what this is intended to accomplish we should talk about that, but I was reviewing this originally as a refactor not a change in functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

! (A || B) == (!A and !B) So it should be equivalent to before I believe. I could change it to msg_type not in ['execute_input', 'update_display_data'] and not msg_type.startswith('comm') if you think it reads easier?

Copy link
Member

@mpacer mpacer Dec 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that display_id should have been dealt with in the cell_map logic regardless of the message type previously, except in the cases where a continue was followed. That try is not going to be hit by all message types and so the else also won't be hit by all message types.


display_id = content.get('transient', {}).get('display_id', None)
if display_id and msg_type in {'execute_result', 'display_data', 'update_display_data'}:
self._update_display_id(display_id, msg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously there was a continue in the case where there was an update_display_data message with a statement that # update_display_data doesn't get recorded am I correct that that logic is now implicit if it's present at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MSeal could you respond to this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late response (post rebase), but yes it's implicit by not raising CellExecutionComplete here

@Carreau
Copy link
Member

Carreau commented Oct 30, 2018 via email

@MSeal
Copy link
Contributor Author

MSeal commented Oct 31, 2018

I can add a test for the inheritance case. It did seem when I tried moving things to the wrong places on purpose that tests would usually fail. I'll do a more thorough check.

Do you want me to do functional rewrite to avoid the side_effect patterns already there? Might be higher risk of changing execution logic but I can do the more extensive changes if you think it's worth it.

@MSeal
Copy link
Contributor Author

MSeal commented Dec 8, 2018

@mpacer I added a new test. From reading the code I think we'd need to do a semi-significant update to add in the message handler we talked about. Because of that I'd prefer to split the implementation into two parts and do that refactor post-merge of this PR.

@MSeal MSeal mentioned this pull request Dec 10, 2018

original = copy.deepcopy(input_nb)
wpp = WrappedPreProc()
executed = wpp.preprocess(input_nb, {})[0]

This comment was marked as outdated.

original = copy.deepcopy(input_nb)
wpp = WrappedPreProc()
executed = wpp.preprocess(input_nb, {})[0]
self.assertEqual(outputs, [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere in the library we tend to use direct asserts as we are using pytest to run the tests. In addition to stylistic consistency, I think it's a lot easier to read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, will fix

def test_process_message_wrapper(self):
outputs = []

class WrappedPreProc(ExecutePreprocessor):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we directly created the context needed so that process_message can be tested functionally, iterating over the cell, msg and cell_index as directly as possible.

I'd prefer if we weren't concatenating all of the outputs into a single object for the whole notebook, I'm thinking it should follow the cell structure.

I'd also like to make sure that message types that are supposed to return None are returning None, which means keeping track of at least the msg type in relation to the output if not some of the more predictive fields (e.g., a display_id that we know is set to a particular value).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done -- this is rebased off #982 now

@MSeal
Copy link
Contributor Author

MSeal commented Apr 15, 2019

@mpacer PR should now be ready for re-review

@@ -25,6 +25,8 @@
from ..utils.exceptions import ConversionException


class CellExecutionComplete(Exception): pass # Used as a control signal
Copy link
Member

@mpacer mpacer Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this an actual docstring? Specifically one that describes what this is going to be used for?

@@ -185,18 +185,18 @@ def normalize_output(output):
def assert_notebooks_equal(self, expected, actual):
expected_cells = expected['cells']
actual_cells = actual['cells']
self.assertEqual(len(expected_cells), len(actual_cells))
assert len(expected_cells) == len(actual_cells)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huzzah! Many thanks for switching to using the pytest assert idiom.

def process_message(self, msg, cell, cell_index):
result = super(WrappedPreProc, self).process_message(msg, cell, cell_index)
if result:
outputs.append(result)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this still need special logic distinct from the standard process_message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's purely to test that one could wrap an retrieve outputs to process on it's own. It has no real purpose for doing so beyond showing how to wrap the function in inherited classes. Namely to prove we can wrap the function safely in papermill cleanly.

@mpacer
Copy link
Member

mpacer commented Apr 21, 2019

This looks great! Thanks for working through all the issues!

@mpacer mpacer merged commit 7502276 into jupyter:master Apr 21, 2019
@MSeal MSeal added this to the 5.5 milestone Apr 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants