Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROPOSAL] Parameterized Kernel Launch #38

Closed
kevin-bates opened this issue Dec 3, 2019 · 4 comments
Closed

[PROPOSAL] Parameterized Kernel Launch #38

kevin-bates opened this issue Dec 3, 2019 · 4 comments

Comments

@kevin-bates
Copy link
Collaborator

kevin-bates commented Dec 3, 2019

This proposal formalizes the changes that introduced launch parameters by defining kernel launch parameter metadata and how it is to be returned from kernel providers and interpreted by client applications. This feature is known as Parameterized Kernel Launch (a.k.a Parameterized Kernels). It includes 'launch' because many of the parameters really apply to the environment in which the kernel will run and are not actual parameters to the kernel. Things like memory, cpus, and gpus are examples of "environmental" parameters.

I'm using this repository as the primary location because this proposal relies on the Kernel Provider model introduced in this library. That said, the proposal affects other repositories, namely jupyter_server, jupyterlab, and notebook once jupyter_server is adopted as the primary backend server.

Launch Parameter Schema

The set of available launch parameters for a given kernel will be conveyed from the server to the client application via the kernel type information (formerly known as the kernelspec) as JSON returned from the /api/kernelspecs REST endpoint. When available, launch parameter metadata will be included within the existing metadata stanza under launch_parameter_schema, and will consist of JSON schema that describes each available parameter. Because this is pure JSON schema, this information can convey required values, default values, choice lists, etc. and be easily consumed by applications. (Although I'd prefer to avoid this, we could introduce a custom schema if we find the generic schema metadata is not sufficient.)

   "metadata": {
       "launch_parameter_schema": {
         "$schema": "http://json-schema.org/draft-07/schema#",
         "title": "Available parameters for kernel type 'Spark - Scala (Kubernetes)'",
         "properties": {
           "cpus": {"type": "number", "minimum": 0.5, "maximum": 8.0, "default": 4.0, "description": "The number of CPUs to use for this kernel"},
           "memory": {"type": "integer", "minimum": 2, "maximum": 1024, "default": 8, "description": "The number of GB to reserve for memory for this kernel"}
         },
         "required": ["cpus"]
       }
    }

Because the population of the metadata.launch_parameter_schema entry is a function of the provider, how the provider determines what to include is an implementation detail. The requirement is that metadata.launch_parameter_schema contain valid JSON schema. However, since nearly 100% of kernels today are based on kernelspec information located in kernel.json, this proposal will also address how the KernelSpecProvider goes about composing metadata.launch_parameter_schema and acting on the returned parameter values.

KernelSpecProvider Schema Population

I believe we should support two forms of population, referential and embedded.

Referential Schema Population

Referential schema population is intended for launch parameters that are shared across kernel configurations, typically the aforementioned "environmental" parameters. When the KernelSpecProvider loads the kernel.json file, it will look for a key under metadata named launch_parameter_schema_file. If the key exists and its value is an existing file, that file's contents will be loaded into a dictionary object.

Embedded Schema Population

Once the referential population step has taken place, the KernelSpecProvider will check if metadata.launch_parameter_schema exists and contains a value. If so, the KernelSpecProvider will load that value, then update the dictionary resulting from the referential population step. This allows per-kernel parameter information to override the shared parameter information. For example, some kernel types may require more cpus that aren't generally available to all kernel types.

KernelSpecProvider will then use the merged dictionaries from the two population steps as the value for metadata.launch_parameter_schema that is returned from its find_kernels() method and, ultimately, the /api/kernelspecs REST API. Any entry for metadata.launch_parameter_schema_file will not appear in the returned payload.

Client Applications

Parameter-aware applications that retrieve kernel type information from /api/kernelspecs will recognize the existence of any metadata.launch_parameter_schema values. When a kernel type is selected and contains launch parameter schema information, the application should construct a dialog from the schema that prompts for parameter values. Required values should be noted and default values should be pre-filled. (We will need to emphasize that all required values have reasonable defaults, but how that is handled is more a function of the kernel provider.)

Once the application has obtained the desired set of parameters, it will create an entry in the JSON body of the /api/kernels POST request that is a dictionary of name/value pairs. The key under which this set of pairs resides will be named launch_params. The kernels handler will then pass this dictionary to the framework, where the kernel provider launch method will act on it.

   "launch_params": {
       "cpus": 4,
       "memory": 512
    }

Note that applications that are unaware of launch_parameter_schema will still behave in a reasonable manner provided the kernel provider applies reasonable default values to any required parameters.

In addition, it would be beneficial if the set of parameter name/value pairs could be added into the notebook metadata so that subsequent launch attempts could use those values in the pre-filled dialog.

Kernel Provider Launch

Once the kernel provider launch method is called, the provider should validate the parameters and their values against the schema. Any validation errors should result in a failure to launch - although the decision to fail the launch will be a function of the provider. The provider will need to differentiate between "environmental" parameters and actual kernel parameters and apply the values appropriately. jupyter_kernel_mgmt will likely provide a helper method for validation.

Note: Since KernelSpecProvider will be the primary provider, at least initially, applications that wish to take advantage kernel launch parameters may want to create their own providers. Fortunately, we've provided a mechanism whereby KernelSpecProvider can be extended such that much of the discovery and launch machinery can be reused. In these cases, the kernel.json file would need to be prefixed with the new provider id so that KernelSpecProvider doesn't include those same kernel types in its set.

Virtual Kernel Types

One of the advantages of kernel launch parameters is that one could conceivably have a single kernel configured, yet allow for a plethora of configuration options based on the parameter values - as @rgbkrk points out here - since this facility essentially fabricates kernel types that, today, would require a separate type for each set of options.

References

#22
jupyter/jupyter_client#434
jupyter-server/enterprise_gateway#640
https://paper.dropbox.com/doc/Day-1-Kernels-jupyter_client-IPython-Notebook-server--ApyJEjYtqrjfoPg1QpbxZfcpAg-MyS7d8X4wkkhRQy7wClXY
#9

cc (based on inclusion in related threads): @takluyver @SylvainCorlay @Zsailer @lresende @rolweber @jasongrout @blink1073 @echarles @minrk @rgbkrk @MSeal @Carreau

@MSeal
Copy link

MSeal commented Dec 9, 2019

I'm not ignoring this issue, but it'll take me a few days with the holidays going on to get some time to thoroughly review it.

Glad there's been progress on this front and a well organized write-up. I will give some deeper thoughts later.

@takluyver
Copy link
Owner

Should this be a JEP by itself? As you mention, it affects things beyond this repository.

@blink1073
Copy link
Contributor

👍 for a JEP

@kevin-bates
Copy link
Collaborator Author

I agree - it seems to fit the definition as a JEP. I've gone ahead and created a JEP for it. Please note that content has been added to the JEP that doesn't exist in this issue.

Closing issue in favor of JEP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants