Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NBectomy #891

Closed
wants to merge 2 commits into from
Closed

Conversation

davidbrochart
Copy link
Contributor

@davidbrochart davidbrochart commented Jun 24, 2022

Jupyter Server should not special-case notebook documents. There should only be two models, as far as the contents API is concerned: directories and files. A notebook is a file.

@davidbrochart davidbrochart marked this pull request as draft June 24, 2022 14:57
@davidbrochart davidbrochart marked this pull request as ready for review June 24, 2022 16:43
@kevin-bates
Copy link
Member

I don't have any particular affinity for the Contents service, but do have an affinity for our users, and believe this is too aggressive, especially given the lack of a deprecation cycle. In essence, this pull request is removing notebook validation and "notarization", which has been a given across every client application and custom ContentsManager implementation. Are we proposing that it's now up to each client application to perform validation (despite the fact that validation should be performed on the server)?

One path forward, which was touched on in yesterday's Server meeting, is to keep the existing classes (thereby preserving existing applications and customizations), but perhaps inject a new base class (e.g., BaseContentsManager) consisting of everything in ContentsManager except validation and notarization and from which ContentsManager derives. You'd then need a separate file-based ContentsManager that derives from BaseContentsManager and would probably need to utilize a separate mixin (sans any validation) that consists primarily of today's FileContentsManager. The default ContentsManager could then either remain AsyncFileContentsManager or we make the new hierarchy (that skips validation/notary) the default. Any exiting configurations that bring their own ContentsManager will continue to override the default and continue to work.

Another approach may be to make validation configurable (and optional). For this, I think you'd need to let it be enabled by default, but could deprecate the default value such that the future default is to skip validation. End users and custom ContentsManager implementations could still override the (new) default to retain validation. This configurable could be an enumeration allowing for combinations of validation and/or notarization.

I agree that it would be a nice simplification to only recognize files (and directories) and also agree that validation is probably over-performed. But I also believe notebook validation is core to a notebook server and find it a bit disturbing that we'd entertain the notion that invalid notebooks could be persisted without any idea they're invalid.

@davidbrochart
Copy link
Contributor Author

Thanks Kevin for the feedback.
You're right, this is a bit aggressive, but I wanted to show that things still work when removing everything related to notebooks from the server (this PR works with JupyterLab). I was pretty confident it would be the case because Jupyverse already does that.
I'll consider the options you gave, but keep in mind that the contents API was basically removed with the latest RTC improvements: all the contents go through the YDoc websocket. If we take the direction that RTC will be the default (and that you will be "collaborating with yourself" without noticing it in a single user scenario), then maybe we shouldn't invest too many efforts in the REST API anyway.
Actually, this simplification is motivated by an RTC issue.

@kevin-bates
Copy link
Member

Thanks @davidbrochart - that explanation is helpful for my understanding.

keep in mind that the contents API was basically removed with the latest RTC improvements: all the contents go through the YDoc websocket

Removed from where? Is Lab (and all its extensions) moving to the RTC interfaces to persist and retrieve content? I must be missing something here - sorry.

I wanted to show that things still work when removing everything related to notebooks from the server

I can remove any core piece of internal functionality (like validation) and demonstrate that clients of that software still work. In fact, it may never fail again!

@davidbrochart
Copy link
Contributor Author

Removed from where?

Sorry, "removed" is not the right word, but JupyterLab in RTC mode doesn't use the REST contents API to load and save documents. It uses the YDocWebSocketHandler.
It's not here yet, but in the future, I can see RTC being the only supported mode, meaning that our document data structures will be implemented using CRDTs.

With RTC, validation becomes a shared responsibility, it's not centralized in the server. Actually, from the CRDT point of view, the server is just another client. Every client is responsible for emitting valid document changes.

@kevin-bates
Copy link
Member

With RTC, validation becomes a shared responsibility, it's not centralized in the server. Actually, from the CRDT point of view, the server is just another client. Every client is responsible for emitting valid document changes.

I see. Thanks.

@echarles
Copy link
Member

this pull request is removing notebook validation and "notarization"

With RTC, validation becomes a shared responsibility, it's not centralized in the server.

Can you give more context on how you see validation, trust and notarization features being kept in the Jupyter RTC world?

@davidbrochart
Copy link
Contributor Author

I think that in RTC, document validity is almost "built-in": a client which modifies e.g. a notebook does it in a consistent way. You might say that it was already the case before RTC, but we still validated the whole notebook on save. I'm not sure we should do that in RTC for performance reasons, since the document is automatically saved after each change. Also we currently have an architecture where all the updates go to the server, before being forwarded to other clients, but there is no guarantee that it will always be the case. For instance, in JupyterLite every client could be directly talking to every other ones.
As far as trust is concerned, there are discussions about whether we should keep it, but this seems like a fragile mechanism anyway, and it's kind of broken currently. @hbcarlos and @fcollonval know more about it.
This PR also removes checkpoints, since they will be replaced with Ystores in RTC.
Let's keep discussing, this PR is about that.

@echarles
Copy link
Member

IMHO Notebook trust and security is something important that should be upgraded whatever feature is considered in a consistent and predicable way . Adding @Carreau to this thread as he may have thoughts on that.

@Carreau
Copy link
Contributor

Carreau commented Jun 27, 2022

The trust model is really to prevent someone from forging and send you a notebook with potentially malicious code. I agree that once you passed the RTC barrier and are collaborating with other folks you do "trust" them.

But ho boy with respect to crafting malicious notebook that can infect each other at load time, just look at the history of excel viruses spreading themselves via macros, and transpose it to notebooks.

We went through many iteration of the trust mechanism, including and not limited to trusting each cell independently. But we figured out how to make a notebook, where simply the ordering of the cell would change its state from harmless to dangerous. Or we found out that some service at the time were sensitive to script tag injections in prompt numbers.

Personally with what I have witnessed in term of notebook capabilities, if signing is removed and not replaced by another mechanism, there is no way I ever use jupyter to open a notebook I received by email, nor ever open a notebook from a cloned github repository or similar. For me ipynb without signature is worse than executing a .exe downloaded on a pirate website.

So really I don't think we should load/save notebook w/o signing.

As for the simplification and treating notebooks vs other files in a more similar matter, I'm all for it.

Another concern for me is if removing the trust mechanism will make some companies back out of jupyter.

cc @rcthomas, @rpwagner @jweill-aws

@Carreau
Copy link
Contributor

Carreau commented Jun 27, 2022

Oh, this week-end I read about Chesterton's Fence, this may be a good example.

@Carreau
Copy link
Contributor

Carreau commented Jun 27, 2022

And yeah, it took me 8 minutes to pwned jupyverse, including fixing ImportError: cannot import name 'User' from 'fps_auth.models' (.../jupyverse/plugins/auth/fps_auth/models.py):
Screen Shot 2022-06-27 at 10 09 22

@davidbrochart
Copy link
Contributor Author

I don't understand what you mean, but if you found something wrong, do you mind opening an issue in jupyverse?

@Carreau
Copy link
Contributor

Carreau commented Jun 27, 2022

Sure, jupyter-server/jupyverse#186

But that seem like a fundamental piece of the Jupyter Security model. If I send you a file that contains javascript, the javascript should not execute on load. Unless of course you are the author of the javascript or trust it.

Which is all what the notebook notary deal with, if the signature does not match, and the notebook is potentially tempered with, it will show at untrusted. Which you can't do by just loading the Json.

But for me that raise the question (and a concern): What is your security model for real-time collaboration ? Because if things like the above are not understood or taken into consideration, i'm scared of the security implications of RTC...

@davidbrochart
Copy link
Contributor Author

davidbrochart commented Jun 27, 2022

Thanks for opening the issue. Jupyverse doesn't implement the current trust mechanism. As it is probably going to change with RTC, we preferred to wait for things to settle.

But for me that raise the question (and a concern): What is your security model for real-time collaboration ? Because if things like the above are not understood or taken into consideration, i'm scared of the security implications of RTC...

There is @trungleduc's PR which deals with rendering widgets in RTC. I think it is a solid base that should generalize to all outputs, and become the new trust mechanism.

@davidbrochart
Copy link
Contributor Author

including fixing ImportError: cannot import name 'User' from 'fps_auth.models' (.../jupyverse/plugins/auth/fps_auth/models.py)

BTW, you didn't mention this in your issue, mind opening a new issue in jupyverse?

@Carreau
Copy link
Contributor

Carreau commented Jun 27, 2022 via email

@davidbrochart
Copy link
Contributor Author

that's just on the released version. latest git seem fine now.

I could not reproduce, mind opening an issue?

@Carreau
Copy link
Contributor

Carreau commented Jun 27, 2022

Sure: jupyter-server/jupyverse#187

@minrk
Copy link
Contributor

minrk commented Jul 8, 2022

I think we're a pretty long way from being ready for this. JupyterLab isn't the only client for Jupyter server to support, and removing the single most fundamental feature of the server (serving notebooks via the contents API) without deprecating for at least one, and probably multiple major release cycles doesn't seem prudent to me.

At the very least, an equivalent or improved trust model must be fully supported before this should even be deprecated.

@Zsailer
Copy link
Member

Zsailer commented Feb 16, 2023

Hey @davidbrochart, we're triaging PRs in the Jupyter Server meeting today. Because there hasn't been activity on this PR in a little while, I'm going to close it. Thanks for your work here and feel free to re-open anytime if this is in-progress.

@Zsailer Zsailer closed this Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants