-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open discussion of various topics... #9
Comments
Hi Kevin. Sorry for the delay replying, but I am happy to use Github issues for questions.
I should also highlight that this machinery allowing kernels to be dynamically provided was meant to be the counterpart to a rethink of how notebooks are associated with kernels. Naming 'the' kernel for a notebook in its metadata makes sense when that's something like 'python', but not when it might be something like an environment or a container on a specific system, or a remote host that someone else may not have access to. The mechanisms to replace that never really got fleshed out. I think it would be a mistake to migrate to this machinery without also fixing that problem - the two were always meant to go together. But I don't have the time or the energy to push that kind of major redesign through. So I can't give any estimate of when this machinery would be ready for real use. |
Hey all! Thanks for pinging me! cc @ivanov @mpacer @MSeal I realize we've each taken different approaches to explore the space for parametrized kernels. Since I wasn't able to attend the server workshop, I'll have to peruse what all you explored at some point. @SylvainCorlay mentioned you all exploring parametrized kernels there. The way I've been thinking about it is largely in two areas:
The major difference from what I'm about to outline is that I don't wish to prescribe how the jupyter server operates, I think this is a great space for you all to innovate. I primarily want to make sure it's interoperable and solves the needs of many operators and users. 😄 PurposeIn the deployment environment I work in, there are many versions of spark as well as clusters that can be connected to. When a user wants to launch a notebook, they either click through this large menu: Or are faced with a huge grid: When in reality that's more like 4 kernels that have a custom version of spark and cluster properties set. We currently have to generate these kernelspecs, which works yet is burdensome in our image (and possibly not up to date with the platform). What I'd love to expose for users is a way to select the version of spark available, from a list of valid values. On top of that, I'd love to be able to specify jars on start. Because of how spark is launched and started, it's paramount to do this in advance of the kernel starting. Possible meansThe approach I've discussed with some other folks (see hackpad for parametrized kernelspecs) involves adding another variable to the kernelspec's argv that will be a path to parameters for a kernel. Alternatively, the parameters could go into the connection file, possibly on a To make this a bit more clear, here's an example
In order to fill out these parameters though, we need to know what's allowed! One way is by having a JSON Schema with allowed values. Here's an example
This allows for server side validation before passing to the kernel. As noted in the comments, the jupyter server and the UI can use a more focused version of this spec, replacing the |
(Sorry, I was afk the last 6 days.) I see a great discussion brewing here - good stuff. I'd like to first respond to @takluyver's comments, then @rgbkrk's in a separate comment...
That seems like a reasonable way to do things. So the KernelFinder would have access to the whitelist traitlet value and pass that, knowing it spans a set of providers where the allowed values may or may not apply. @takluyver: I'm curious why you felt the need to keep traitlets out of providers? I think that may be a difficult task, but its probably too early to fully determine that.
We can talk more about this (see next comment). I agree that a separate argument is fine. This argument should be whatever we call it when it's provided in the start kernel REST API body.
Hmm. The entity that takes the two pieces of persisted information (connection-info and "provider state") must be the provider. It is up to the provider as to how the KernelManager manager is constructed. The caller of the provider has no idea what kind of KernelManger it needs to deal with - so this must be a provider thing and, I would argue, this alternate launch mechanism requires both pieces of information (connection-info and whatever the kernel-manager produces following the initial launch) - because the provider may want to produce a new set of connection info when loading from persisted state. I agree these methods should be standardized. I'll toss out The thing that actually persists these two pieces of information (the connection info and the "provider state") will be responsible for associating them into a single entity. This will likely be done via the subclassing of
Seems like we should get on the same page here and figure this out. I think as long as jupyter is based on tornado that we should be using its async model. Then again, I don't know the entire evolutionary story here.
I guess I'm not really following the issue here. If the kernel metadata in the notebook is 'Python on Kubernetes with Spark 2.4' and that notebook is shared with someone in which 'Python on Kubernetes with Spark 2.4' doesn't apply, they will get prompted to change the kernel - and, in all likelihood, fail to be able to run that notebook. That seems orthogonal to kernel providers. I suspect I'm missing something though. Could you please shed more light on this? Thanks. |
@rgbkrk - regarding paramaterized kernels... I think we're all very close here. I definitely agree that a json spec that describes what parameters a given kernel provider supports is the way to go. I'd like to see that particular stanza embedded in the already existing Given the proposal for kernel providers (which requires further ratification IMHO), disk-based kernelspecs as we know them cannot be assumed. However, what can be assumed is the REST API content, in which the structure of kernelspecs is defined. Just bringing this up since things can get a little ambiguous between these worlds. I'd like to be careful just how parameters are used and I think this is a good thing relative to the kernel providers proposal. That is, its up to the provider to determine how and when parameters are applied. The provider is responsible for producing the metadata corresponding to the parameters. Some providers may use a file, others may not. That said, we should agree on what meta-properites are supported, etc. so that client-side tooling can be achieved. We should not assume that all parameters are interpreted solely by the target kernel. In fact, I believe the vast majority of parameters are used to seed the kernel's environment, rather than getting fed directly to the kernel. So, again, this is more of a launch thing and, thus, provider-specific. Here's what I'd like to see relative to how parameterization is accomplished...
It would be great to hop on a call with everyone at some point to hash this out! Again, I think we're close. |
I guess one problem is that most people probably have a kernel called Perhaps the language could be embedded and used to filter the suggested kernel with the top suggestion being the best fuzzy match to the embedded kernel name... but require the user to actually confirm the kernel choice so they can at least see the possible options to choose from. |
Since the vast majority of notebooks will tend to be used against the same configurations, I think storing the selected kernel, along with its parameterized values, should be the default behavior. However, client applications should probably enable the ability to enter new values - perhaps as an alternative means of starting the kernel? E.g., I just ran this notebook with 8 GB, now I want 32 GB and 2 GPUs. The generically named kernels, like |
Definitely agree there. |
I'm glad this conversation is advancing! We've been talking about this for a while as something we should do. I agree with the majority of the conclusions above. I did have one question I hadn't seen answered. How should the existing clients behave if they encounter a parameterized kernel? Basically what's the backwards or forwards compatibility story arch for this change, in particular for jupyter_client? On the opinion side, I believe we should keep full dependency management out of the scope of the initial parameterized kernel spec. By this I mean that we should be clear that the notebook metadata won't store all of it's dependencies in relation to the kernel and that kernels aren't obligated to present an interface for controlling all of it's dependencies within it's environment. The examples above are mostly focused on configuring existing tooling in the environment rather than generating a completely new dependency chain for each kernel. I believe this distinction will help with keeping the proposal smaller and define a boundary of responsibility with this step forward. On the dependency topic I'm going to submit a proposal to nbformat in June for a notebook requirements/dependency spec based on-going conversations with a mix of Jupyter devs and notebook teams at various companies. It's possible we might want to move that into a specialization of parameterized kernels as people review, but when we initially discussed this topic in depth at NES with folks it seemed like combining these topcis was not a good path forward at the time. |
Do you mean - encounter a parameterized kernel spec or an actual kernel? In the former case, the applications (e.g., Notebook or Lab or a REST client) would ignore the parameter metadata of the kernel spec - which is the case today. It should be the responsibility of the kernel provider to provide reasonable defaults to required parameters and, if those cannot be provided, should fail to launch the kernel. I suspect the same is true in the latter case and hope kernels that require parameters have reasonable defaults or, again, fail the request. I believe the majority of parameters will only apply to the launch environment in which the kernel will run (cpus, memory, volume to mount, image to launch, etc) and not necessarily be acted upon by the kernel. I may be off on that, but we should make sure that any parameters that are indeed acted on by the kernel itself are well-documented so that different kernel providers can use those same kernel implementations and provide the necessary parameter values. If your compatibility question is more in regard to the kernel provider architecture, I think any applications that currently subclass |
@takluyver - it's not clear to me where to open these discussions. I believe a Jupyter Server discourse category will be created soon and I'd like to see a Kernel Providers sub-category within it. Until then, I thought I'd open (for now) a single issue with multiple topics that, I believe, are worthy of discussion.
I'm happy to start working on these, but wanted to have the discussion first. I'm also copying @minrk, @rgbkrk, and @Carreau in case they have opinions. Anyone else, please feel free as well.
Whitelisting
Whitelisting is important in scenarios where many (dozens) of "kernelspecs" are configured. It provides a mechanism for administrators to isolate/limit which kernel types are available to the end user. I believe we need to preserve this functionality. Coupled with this PR in traitlets, admins could modify the whitelist without having to restart the server.
The current implementation doesn't consider whitelisting, nor is there a location from which to associate the trait. Was its omission intentional? If so, could you explain why?
Parameters
This came up in the Jupyter Server Design workshop. Each provider that supports parameterization will need to return the parameter metadata corresponding to their provided kernel. Rather than expose a new field returned from the REST API, I propose we make a
parameters
stanza within the already supportedmetadata
stanza. The format of the parameters will be JSON which adhere's to a schema - the idea being that consuming applications will have what they need to appropriately prompt for values. Frontend applications that are interested in presenting parameters and gathering their values, would look for theparameters
stanza inmetadata
. Those that aren't, will continue as they do today.This implies that
metadata
should be (optionally) returned fromfind_kernels
depending on if the provider supports parameters or needs to convey other information in that stanza.I don't think there's any need to return
argv
. That strikes me as purely server and, for that matter, provider-specific.Besides the parameter values, other information will need to be fed to the provider methods - so we probably want
**kwargs
oninit
orlaunch
. Examples of this are configuration-related items and other items included the json body of the kernel start request.Kernel Session Persistence
We need the ability to persist relevant information associated with a running kernel in order to reconnect to it from another server instance. The use-case here is high availability. While HA functionality may be provided as an extension to Jupyter Server, methods to save and load that information are necessary, as is the ability to "launch" from the persisted state.
Async Kernels
I think we should ONLY support async kernels in this new implementation. This implies that Jupyter Server will need support async kernels at its
MappingKernelManager
level. We can leverage the async kernel management work in the PRs already open on jupyter_client and notebook.If this isn't the right mechanism to use for discussion, please let me know where that can take place.
Thank you.
The text was updated successfully, but these errors were encountered: