-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSS] Add a Session Management infrastructure for extension developers #122
Comments
Background discussions HTTP Personality for Enterprise Gateway jupyter-server/enterprise_gateway#734 (comment) |
During the Jupyter server meeting in May, I suggested labels as a way to enable collaboration between members of a trusted team on a shared Jupyter server. The idea is that the UI (client side) could label the kernels and/or kernel sessions with the user that created them. This would suffice for use cases like "stop all my kernels". A true multi-user server with isolation between users is a much harder nut to crack. Could you be a bit more specific about use cases that require the server to track browser/client sessions? |
With labels, I mean a generic name-value mechanism, like in Kubernetes. |
OAuth doesn't require a session. The authorization token itself is sent with every request, and can be validated on its own. OIDC builds on OAuth, I'm pretty sure it doesn't require a session either. Those modern protocols are designed for scalability, and sessions are an antipattern for scalability. REST mandates stateless processes. 12 Factor mandates stateless processes. I know that our situation is a bit different, because kernels are stateful by definition. And the Jupyter Server needs to keep track of the running kernels, and maybe kernel sessions. But this management information can be externalized, to keep the Jupyter Server process itself stateless and allow for crash recovery. Labels attached to the managed objects can be externalized in the same fashion. Open WebSockets are a kind of state too, but those can fail and the client will simply re-connect, possibly through a restarted Jupyter Server. Client sessions in the Jupyter Server are a different matter. They add state that would need to be externalized separately, independent of the objects that the Jupyter Server has to manage already. Can you propose a use case that is more specific than "might be useful"? I don't have a problem with an optional auth handler that implements a token cache for its purposes, and maybe even sets a cookie if that is needed on top of the token. Anyone would be free to configure a different auth handler and run a Jupyter Server that works without cookies. But you seem to be asking for a generic mechanism for session management, based on the assumption that "it will be needed at some point". On that generic level, my counter argument is YAGNI. Sessions are an antipattern, so we should strive to avoid them, rather than implement a generic mechanism that will tempt developers to use it just "because it's there". |
For security reasons, once authenticated, you can put in your server session your profile (you may not want to send complete profile to client). You may also persist there the allowed actions based on your profile and filter on server side on which action the user is requesting. For performance reasons, you can pull from an external source a larger set of information (e.g. gimme the 1000 latests comments a user has done) and work with that in memory to deliver the comments that fullfill the search criteria. For customizability reasons, users could have services instances created very specifically and that would be available in their session object. For collaboration purpose, Session is ideal to add users an keep state on their connections (read only...). I have read your proposal to annotate the kernel sessions, but a server can and should be able to live without kernels, e.g. for the content API. |
I'm not saying we should prevent extensions from managing sessions, if they have a case for it. And if it's a feature found generally useful for extensions, then adding it to Jupyter Server might be a good idea. So, are there extensions, existing or in development, that require sessions? In particular, sessions tracked by the server with a cookie? Managing user profiles doesn't sound like something Jupyter Server should be doing out of the box. And if it does, the profiles should be managed by user, not by session. Same for your comment search example - why should that be managed as part of a session, instead of a cache by user? If you're creating customized service instances, then keep the custom information with the service instance, not in a session. If you want to make a case for multi-user support, then maybe we should discuss that under the topic of multi-user support, and decide later what kind of session tracking addresses the requirements best. But as far as I know, Jupyter Server as yet has no means to isolate users. By default, it still starts kernels on the same node, with the same operating system user as the server itself. And then lets users send code for execution, which can connect to all ports on the node, and mess with all processes running as the same operating system user. If that is still the case, I see two ways of working with Jupyter Server in a multi-user scenario:
In the first case, Jupyter Server doesn't have to distinguish between users, because it's running for only one. User-specific information can be managed as global for each server instance. But that's just my opinion. I'll wait for others to share theirs. |
I have rephrased the title from |
👍 isolation is a tough nut to crack, but until we have the ability to distinguish activities tied to a client we can't really organize multi-tenant support. In either case, we should try to define what kinds of "services" are available in headless operations. If we wanted to expose Content Services, we'd probably need to have a "manager" introduced that spans Content Manager instances - similar to the |
I haven't thought fully through all these cases, but I agree that:
I feel the initial issue was a little thin on some of these (especially pt2). This might be obvious to some, but people like me might need these things spelled out 😉
|
Great discussion here. Thank you @echarles for opening up the conversation. I agree with @vidartf's breakdown. I (personally) need to reason through the various goals/use-cases a lot more before making any strong opinions about how to properly manage user identity, authorization, sessions, etc. in the jupyter_server. I mentioned this briefly in our last Jupyter Server meeting—I had previously been noodling around with a Jupyter Server implementation that includes authorization.
You can think of this server as a "shared drive" for users inside JupyterHub. I'm planning to expand this service to check authorization for other services (i.e. kernels, terminals, etc.). This provides a mechanism for multi-user access to a jupyter server. How this translates to RTC, I'm not sure. This is really experimental right now, but I could see this thin authorization layer making it into Jupyter Server in the future. It will probably require a JEP though. |
For various usecases (Realtime collaboration, Multiuser, Kernel Gateway HTTP Personality), it will be needed at some point to have a Session Object binded to the HTTP client (browser or software).
In other languages, this is typcially done with a session cookie.
Not sure how Tornado implements this.
https://pypi.org/project/torndsession/
https://github.com/cole/tornado-sessions
The text was updated successfully, but these errors were encountered: