Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] Parameterized Kernel Launch #46

Closed
wants to merge 2 commits into from

Conversation

kevin-bates
Copy link
Member

@kevin-bates kevin-bates commented Dec 10, 2019

Posted as a formal JEP as suggested in the original issue.

This proposal depends on the acceptance of JEP #45.

Commit 4c0727f rewrites the proposal in terms of Kernel Provisioners available in jupyter_client 7.0.

If necessary, I'm happy to open an entirely different proposal but felt much of this discussion was still germane to the rewrite.

Copy link
Contributor

@MSeal MSeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. I had a few comments we might want to hash out some, but overall I think this would be a good step forward.

A common mechanism in use today to vary a kernel's launch behavior utilizes environment variables. These variables are conveyed to the launch mechanism and set into the kernel's environment when launched. Since environment variables are commonly used in containerized contexts, we may want to support the ability for their specification within this mechanism. There are a few options to distinguish these kinds of parameters from "contextual" and kernel-specific parameters, if at all (option 4).

1. Use a custom schema that defines a `is_env` meta-property. Schema entries with `is_env=True` will be set into the kernel's environment. I'd prefer to avoid a custom schema since it would require access to its definition and introduces more deployment/configuration issues.
2. Create an explicit sub-section in `launch_parameter_schema` named `env_variables` that define the metadata corresponding to environmental variables. The payload on the subsequent POST request (to start a kernel) would then also include an `env_variables` sub-section consisting of the name/value pairs that the kernel provider ensures are placed into the target kernel's environment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one -- explicit but capable enough to expand to many usecases without accidentally leaking information into kernels you wouldn't want to expose (e.g. secrets).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could also come along in later implementations

Once the application has obtained the desired set of parameters, it will create an entry in the JSON body of the `/api/kernels` POST request that is a dictionary of name/value pairs. The key under which this set of pairs resides will be named `launch_params`. The kernels handler will then pass this dictionary to the framework, where the kernel provider launch method will act on it.

```json
"launch_params": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice and simple


Note that applications that are unaware of `launch_parameter_schema` will still behave in a reasonable manner provided the kernel provider applies reasonable default values to any required parameters.

In addition, it would be beneficial if the set of parameter name/value pairs could be added into the notebook metadata so that subsequent launch attempts could use _those_ values in the pre-filled dialog.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth outlining how interfaces might respect notebook metadata or application metadata to fullfil execution needs. For example it's help a lot of papermill and our hosted notebook server could both set the same spark configuration that get's set when a notebook is scheduled vs run in a dev environment. If we had a convention for where this data was saved we could make the tooling respect mapping that into the kernel parameters on launch.



## Launch Parameter Schema
The set of available launch parameters for a given kernel will be conveyed from the server to the client application via the _kernel type_ information (formerly known as the kernelspec) as JSON returned from the `/api/kernelspecs` REST endpoint. When available, launch parameter metadata will be included within the existing `metadata` stanza under `launch_parameter_schema`, and will consist of JSON schema that describes each available parameter. Because this is pure JSON schema, this information can convey required values, default values, choice lists, etc. and be easily consumed by applications. (Although I'd prefer to avoid this, we could introduce a custom schema if we find the generic schema metadata is not sufficient.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should try to keep to the generic schema as described. Given how this works I think people could write their own extensions of the spec for specialized usecases without deviating dramatically as well. It wouldn't need to be formally adopted in the jupyter spec per-say.



## Virtual Kernel Types
One of the advantages of kernel launch parameters is that one could conceivably have a single kernel configured, yet allow for a plethora of configuration options based on the parameter values - as @rgbkrk points out [here](https://github.com/takluyver/jupyter_kernel_mgmt/issues/9#issuecomment-496434455) - since this facility essentially _fabricates_ kernel types that, today, would require a separate type for each set of options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to describe how this could be represented in current UIs? I think it might make sense to provide an easy way to register a virtual kernel in the most common interfaces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I view a virtual kernel as simply a highly parameterized kernel. Its name may imply that it itself is virtual, but I don't think the notion of 'virtual' needs to be expressed in the kernel-type's metadata. It does make sense, though, that such a kernel-type should have some indication that this type is parameterized. Parameterized kernels will incur an additional dialog at startup and user's will likely want to know there are "options" when starting kernel A, for example.

I think it might make sense to provide an easy way to register a virtual kernel in the most common interfaces.

I'm not clear on what you mean by 'register' and 'the most common interfaces'. Could you please clarify the former and provide examples of the latter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

To be more specific, we should make it clear how a user would find, select, and execute against a virtual kernel in nbconvert/papermill, classic UI, and lab UI (the UX for user interfaces). Some of those would maybe need to have a supporting team write an extension for lab, maybe the UI would provide a javascript component that exposes known virtualization options from the kernel manager API. Papermill might need to have a custom engine for a teams' internal tooling that maps specific kwargs inputs to internal kernel virtualized options, etc.

Does that help clarify my statement?

I was trying to think through the intended direction for how parameterized kernel would be adoptable by existing setups.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - yes that's helpful - thank you. I agree that how the UI applications interpret and present parameters is something they'll need to work through - and a primary reason for pure JSON schema. But, again, I don't view virtual kernels as anything more than kernels with parameters - most of which would be used to create/pick the environment in which the kernel will run.

## Backwards Compatibility
Parameter-aware applications that are receiving results from the `/api/kernelspecs` REST API must be able to tolerate the _non-existence_ of `metadata.launch_parameter_schema` within the kernelspec results. Likewise, parameter-unaware applications will need to ignore the parameter stanza - which is the case today.

Kernel providers that support parameterized launches must also handle required parameters by providing reasonable defaults. In addition, they must not assume that the application will provide those defaults - despite the fact that the schema for those required parameters define default values - since the application that is requesting the kernel start (via the POST on `/api/kernels`) may be unaware of this parameter mechanism.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be ok to have kernel with required fields to fail. Maybe emphasize that kernels with required inputs will fail with a specific message format explaining why they can't launch?

Copy link
Member Author

@kevin-bates kevin-bates Dec 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this decision should be up to the kernel provider. Yes, if the provider deems that failure should happen if required values are not provided, then so be it. I don't mean to imply required values must have defaults, but I think it would be a good best practice, particularly if the application is parameter-unaware.

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/how-to-limit-memory-in-a-standalone-jupyterlab/4311/3

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/jupyter-and-github-alternative-file-formant/4972/21

@ellisonbg
Copy link
Contributor

An overall comment: in situations where we find ourselves needing parameterized kernels, we also need parametrized terminals, as both typically depend on the same set of environment parameters (environment, ENV VARS, etc.).

@kevin-bates
Copy link
Member Author

Hi @ellisonbg.

we also need parametrized terminals

  • The idea with parameterized kernels is that the kernel specification is the vehicle by which available parameters are expressed. Since terminals have no concept of "terminal specs", how would the available parameters be conveyed?

  • If parameters could be conveyed, what are the plans to act on those parameters within the terminal subsystem (which only applies to the notebook or server applications)? I'm assuming that some parameters will dramatically influence how and where the terminal session is started, which is fixed (other than shell_cmd) and local today.

I can see kernel parameterization being adopted in the use of "console" sessions (since those are just kernels themselves), but I'm struggling with how parameterization would be introduced in the contexts of terminals (i.e., terminado).

@echarles
Copy link
Member

echarles commented Aug 13, 2021

I have POC an implementation shown in the following GIF. User env var are not implemented, but that should not be a big deal. Hope this is useful to the discussion.

Untitled

The needed branches to make this work:

The Kernelpsec with the json schema:

{
    "display_name": "Python with parameters",
    "language": "python",
    "interrupt_mode": "signal",
    "argv": [
        "python",
        "-m",
        "ipykernel_launcher",
        "-f",
        "{connection_file}",
        "--cache-size={cache_size}",
        "--matplotlib={matplotlib}"
    ],
    "env": {},
    "parameters": {},
    "metadata": {
        "kernel_provisioner": {
            "provisioner_name": "local-provisioner",
            "config": {
                "title": "Python Kernel Parameters",
                "type": "object",
                "properties": {
                    "cache_size": {
                        "type": "integer",
                        "description": "Set the size of the output cache",
                        "default": 1000,
                        "minimum": 0,
                        "maximum": 50000
                    },
                    "matplotlib": {
                        "description": "Configure matplotlib for interactive use with the default matplotlib",
                        "type": "string",
                        "default": "auto",
                        "enum": ["auto", "agg", "gtk", "gtk3", "inline", "ipympl", "nbagg", "notebook", "osx", "pdf", "ps", "qt", "qt4", "qt5", "svg", "tk", "widget", "wx"]
                    }
                }
            }
        }
    }
}

@kevin-bates
Copy link
Member Author

This proposal was originally written against the Kernel Providers JEP (which was closed), then "adjusted" to be relative to Kernel Provisioners. In both instances, there was no traction from the community other than a few (greatly appreciated) comments.

I strongly believe such a parameterization proposal is warranted and highly desired, and that this PR serves as a good basis for how something like this can work. I also believe this should be a community effort and one that follows the prescribed direction for JEPs - which this draft did not - so am closing this pull request.

I would be happy to participate in a future proposal. If others are interested, perhaps we can use the tail-end of this PR to put together a working group?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants