-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Headless/Scheduled Notebooks #957
Conversation
Codecov Report
@@ Coverage Diff @@
## main #957 +/- ##
===========================================
- Coverage 72.35% 33.80% -38.55%
===========================================
Files 64 75 +11
Lines 8131 8845 +714
Branches 1355 1457 +102
===========================================
- Hits 5883 2990 -2893
- Misses 1838 5672 +3834
+ Partials 410 183 -227
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There were some questions raised in the server meeting about the extension. Surfacing them here for community members to elaborate and provide suggestions.
@vidartf @kevin-bates @blink1073 |
I would suggest adding an API to list the available scheduler backends, where the model contains a JSON schema for how to configure the scheduler. |
This comment was marked as resolved.
This comment was marked as resolved.
I need to read more carefully. I understand now! Sorry for the already-answered question. |
Closing now that work is taking place in https://github.com/jupyter-server/jupyter-scheduler. Thanks all! |
Summary
Headless/Scheduled notebooks will enable end users to run and schedule notebooks as jobs anywhere they are running JupyterLab (laptop, on-prem, JupyterHub etc.). This capability will be offered to JupyterLab users as two primary components:
This PR will only include the relevant components of the server extension, UI extension will be presented in a separate PR, once we have acceptance from the community to setup the server extension repo. We will not be merging this code into jupyter-server/jupyter-server, but using this as a mechanism to get feedback from the community and to help set up a separate repo under the jupyter-server org.
Concepts
Job definition
A job definition is a notebook and the metadata required to run it in an unattended fashion. A scheduled job definition contains additional metadata about a recurring schedule for running it.
Job
A job is a single instance of a notebook being run. There are two categories of jobs:
Components
Rest API
Provides the REST API endpoints for managing jobs
GET /jobs/<job_id>
Returns job details for a specific job, calls get_job in Scheduler API
GET /jobs
Returns job details for jobs filtered by query, calls list_jobs in Scheduler API
POST /jobs
Creates a new run now job, calls create_job in Scheduler API
PATCH /jobs/<job_id>
Updates a job, calls update_job in Scheduler API
DELETE /jobs/<job_id>
Deletes a job, calls delete_job in Scheduler API
POST /job_definitions
Creates a new job definition, calls create_job_definition in Scheduler API
GET /job_definitions/<job_def_id>
Returns job definition details for a specific definition, calls get_job_definition in Scheduler API
POST /job_definitions/<job_def_id>/jobs
Creates a new job with job definition, calls create_job in Scheduler API
GET /job_definitions
Returns list of definitions filtered by query, calls list_job_definitions in Scheduler
PATCH /job_definitions/<job_def_id>
Updates a specific job definition, calls update_job_definition in Scheduler
DELETE /job_definitions/<job_def_id>
Deletes job definition, calls delete_job_definition in Scheduler API
GET /runtime_environments
Returns list of details for all environments, calls list_environments in EnvironmentsManager
Extension points
ExecutionManager
Execution manager is a configurable trait class that provides a wrapper around the execution engine that will execute the notebook. The default implementation provided by the server extension uses
nbconvert
to execute the notebooks, supportshtml
andnotebook
outputs, and allows Papermill-like job parameters.Execution manager provides a second API that should return the list of features supported by the execution engine. This is useful for the UI to only show options that are supported by the specific backend. For example, here are the list of features supported by the default execution manager at this time.
This is an appropriate extension point for backends that only want to replace the execution engine that executes the notebook, but want to keep the rest of the backend and persistence layer intact.
Scheduler
Scheduler
is a configurable trait class that can be used to replace the backend service api to support a different backend than the default provided by the server extension.This is the central API that is used by the REST handlers to run and schedule jobs. The
BaseScheduler
class is expected to be implemented by different backends to attach their own persistence store and task runner. A default implementationScheduler
is provided that uses SQLite as persistence store and a python process to run jobs.EnvironmentsManager
Environments provide a mechanism to switch runtime context while submitting a job execution. For example, the default environments manager bundled with this extension provides a list of locally installed
conda
environments. The user can select one of these environments during job submission, and the backend is expected to run the notebook in the selectedconda
environment.EnvironmentsManager
is a configurable trait class that can be used to replace the list of environments presented to the user. Note that how the runtime context will change during execution of the notebook job relies on the specific backend implementation.Environments manager also provides a second API to provide a command id which, if non-empty, allows the UI to launch the environment management UI. Note that the actual UI to manage environments is outside the scope of the scheduler lab extension; the server extension is merely providing this extension point so other backends can accommodate management of environments for their users. We expect that most users will not find the need to manage environments from within the job creation process, and will mostly select one of the pre-configured named environments.
Current features
The current extension supports these features:
Planned features
Lab extension example