Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to capture validation errors #249

Merged
merged 1 commit into from
Mar 2, 2022

Conversation

kevin-bates
Copy link
Member

This pull request stems from an issue discovered during Jupyter Server performance analysis in which it was learned that the server is validating notebooks twice (and on both read and write operations). The crux of the issue is that both methods do not give the caller information regarding the validation status of their content - so they must explicitly validate a second time. This appears to have been the case for the last 7 to 8 years (since the code resided in notebook). I suspect this is based on decisions at the time, but this pull request attempts to rectify this in a (hopefully) acceptable manner. Since the elimination of the redundant validation can improve performance by nearly 50%, this seems like an opportunity we should not ignore.

Because so many years have passed, I believe there are at least two backward-compatibility concerns that lead to this particular resolution.

  1. The read/write calls do not inform the caller that the operation encountered a validation error. As a result, the content (on read) is always returned, even in light of validation errors, which, today, are logged as errors. This behavior also leads to the content always being written, again, in light of validation errors. (Of course, filesystem-related issues will raise appropriate exceptions, but all things being equal, the caller has no idea a validation issue is present.)
  2. Applications wish to format their own validation error. Jupyter Server (and notebook) used a subsequent (and unconditional) call to nbformat's validate() method, to capture the exception and produce an application-friendly error message when validation issues are encountered.

This change adds an optional dictionary-valued parameter (capture_validation_error) that applications can pass to capture the ValidationError instance for use by the calling application. If the parameter is non-None and a dictionary instance, a validation error will be added into the dictionary under the key 'ValidationError' while the corresponding value will contain the ValidationError instance. This would allow applications that make an additional call to validate() to remove the second call since they have both their content (on reads) and the ValidationError instance (when validation issues are present) they can use to make decisions.

Alternative approaches

  1. One approach would be to add an optional boolean parameter that skips validation on the read/write methods, leaving actual validation up to the application. Since today's read/write code leaves no indication that a validation error was encountered, this approach would work as well. However, given that read/write methods already include validation, it does require an additional method call and feels a bit clunky as it violates the original intent.
  2. Another approach would be to raise ValidationError when it occurs, which also allows the application to produce a friendly message, but will prevent the return of content (on reads) and persistence of content (on writes) - that is assumed behavior today.

There may be other solutions, but I think we should do something as this is the kind of low-hanging fruit that performance-sensitive folks live for. 😄

cc: @goanpeca, @mlucool

@kevin-bates
Copy link
Member Author

Looks like the Windows 3.6 platform is no longer available for CI. I can remove 3.6 from the CI matrix, but that feels better suited for another PR.

Copy link
Contributor

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@blink1073 blink1073 merged commit 66443e6 into jupyter:master Mar 2, 2022
@kevin-bates
Copy link
Member Author

Thank you @blink1073! It will be great to take advantage of these changes in Jupyter Server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants