Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headless/Scheduled Notebooks #957

Closed
wants to merge 2 commits into from

Conversation

3coins
Copy link
Contributor

@3coins 3coins commented Sep 1, 2022

Summary

Headless/Scheduled notebooks will enable end users to run and schedule notebooks as jobs anywhere they are running JupyterLab (laptop, on-prem, JupyterHub etc.). This capability will be offered to JupyterLab users as two primary components:

  1. The REST API which runs as a Jupyter Server extension and has extension points for different backends.
  2. The UI, which is a JupyterLab extension

This PR will only include the relevant components of the server extension, UI extension will be presented in a separate PR, once we have acceptance from the community to setup the server extension repo. We will not be merging this code into jupyter-server/jupyter-server, but using this as a mechanism to get feedback from the community and to help set up a separate repo under the jupyter-server org.

Concepts

Job definition

A job definition is a notebook and the metadata required to run it in an unattended fashion. A scheduled job definition contains additional metadata about a recurring schedule for running it.

Job

A job is a single instance of a notebook being run. There are two categories of jobs:

  1. Run now: Run a notebook immediately, not on a schedule
  2. Scheduled job: A notebook job that should run on a regular basis or run once based on a job definition

Components

Rest API

Provides the REST API endpoints for managing jobs

GET /jobs/<job_id>

Returns job details for a specific job, calls get_job in Scheduler API

GET /jobs

Returns job details for jobs filtered by query, calls list_jobs in Scheduler API

POST /jobs

Creates a new run now job, calls create_job in Scheduler API

PATCH /jobs/<job_id>

Updates a job, calls update_job in Scheduler API

DELETE /jobs/<job_id>

Deletes a job, calls delete_job in Scheduler API

POST /job_definitions

Creates a new job definition, calls create_job_definition in Scheduler API

GET /job_definitions/<job_def_id>

Returns job definition details for a specific definition, calls get_job_definition in Scheduler API

POST /job_definitions/<job_def_id>/jobs

Creates a new job with job definition, calls create_job in Scheduler API

GET /job_definitions

Returns list of definitions filtered by query, calls list_job_definitions in Scheduler

PATCH /job_definitions/<job_def_id>

Updates a specific job definition, calls update_job_definition in Scheduler

DELETE /job_definitions/<job_def_id>

Deletes job definition, calls delete_job_definition in Scheduler API

GET /runtime_environments

Returns list of details for all environments, calls list_environments in EnvironmentsManager

Extension points

ExecutionManager

Execution manager is a configurable trait class that provides a wrapper around the execution engine that will execute the notebook. The default implementation provided by the server extension uses nbconvert to execute the notebooks, supports html and notebook outputs, and allows Papermill-like job parameters.

execution_manager_class = TypeFromClasses(
        default_value=DefaultExecutionManager,
        klasses=[
            "jupyter_scheduling.executors.ExecutionManager"
        ],
        config=True,
        help=_i18n("The execution manager class to use.")
    )
class ExecutionManager(ABC):
    """ Base execution manager.
    Clients are expected to override this class
    to provide concrete implementations of the
    execution manager. At the minimum, subclasses
    should provide implementation of the 
    execute, and supported_features methods.
    """

    def process(self):
        """The template method called by the 
        Scheduler, backend implementations 
        should not override this method. 
        """
        self.before_start()
        try:
            self.execute()
        except Exception as e:
            self.on_failure(e)
        else:
            self.on_complete()

    @abstractmethod
    def execute(self):
        """Performs notebook execution,
        custom backends are expected to
        add notebook execution logic within
        this method
        """
        pass

    @classmethod
    @abstractmethod
    def supported_features(cls) -> Dict[JobFeature, bool]:
        """Returns a configuration of supported features
        by the execution engine. Implementors are expected
        to override this to return a dictionary of supported
        job creation features.
        """
        pass

    def before_start(self):
        """Called before start of execute"""
        ...

    def on_failure(self, e: Exception):
        """Called after failure of execute"""
        ...    

    def on_complete(self):
        """Called after job is completed"""
        ...

Execution manager provides a second API that should return the list of features supported by the execution engine. This is useful for the UI to only show options that are supported by the specific backend. For example, here are the list of features supported by the default execution manager at this time.

def supported_features(cls) -> Dict[JobFeature, bool]:
    return {
        JobFeature.job_name: True,
        JobFeature.output_formats: True,
        JobFeature.job_definition: False,
        JobFeature.idempotency_token: False,
        JobFeature.tags: True,
        JobFeature.email_notifications: False,
        JobFeature.timeout_seconds: False,
        JobFeature.retry_on_timeout: False,
        JobFeature.max_retries: False,
        JobFeature.min_retry_interval_millis: False,
        JobFeature.output_filename_template: False,
        JobFeature.stop_job: True,
        JobFeature.delete_job: True
    } 

This is an appropriate extension point for backends that only want to replace the execution engine that executes the notebook, but want to keep the rest of the backend and persistence layer intact.

Scheduler

Scheduler is a configurable trait class that can be used to replace the backend service api to support a different backend than the default provided by the server extension.

scheduler_class = TypeFromClasses(
        default_value=Scheduler,
        klasses=[
            "jupyter_scheduling.scheduler.BaseScheduler"
        ],
        config=True,
        help=_i18n("The scheduler class to use.")
    )

This is the central API that is used by the REST handlers to run and schedule jobs. The BaseScheduler class is expected to be implemented by different backends to attach their own persistence store and task runner. A default implementation Scheduler is provided that uses SQLite as persistence store and a python process to run jobs.

class BaseScheduler(ABC):
    """Base class for schedulers. A default implementation 
    is provided in the `Scheduler` class, but extension creators
    can provide their own scheduler by subclassing this class.
    By implementing this class, you will replace both the service
    API and the persistence layer for the scheduler.
    """

    @abstractmethod
    def create_job(self, model: CreateJob) -> str:
        """Creates a new job record, may trigger execution of the job.
        In case a task runner is actually handling execution of the jobs,
        this method should just create the job record.
        """
        pass

    @abstractmethod
    def update_job(self, model: UpdateJob):
        """Updates job metadata in the persistence store,
        for example name, status etc. In case of status
        change to STOPPED, should call stop_job
        """
        pass

    @abstractmethod
    def list_jobs(self, query: ListJobsQuery) -> ListJobsResponse:
        """Returns list of all jobs filtered by query"""
        pass

    @abstractmethod
    def count_jobs(self, query: CountJobsQuery) -> int:
        """Returns number of jobs filtered by query"""
        pass

    @abstractmethod
    def get_job(self, job_id: str) -> DescribeJob:
        """Returns job record for a single job"""
        pass

    @abstractmethod
    def delete_job(self, job_id: str):
        """Deletes the job record, stops the job if running"""
        pass

    @abstractmethod
    def stop_job(self, job_id: str):
        """Stops the job, this is not analogous
        to the REST API that will be called to 
        stop the job. Front end will call the PUT
        API with status update to STOPPED. In case
        of a task runner, you can assume a call to task
        runner to suspend the job.
        """
        pass

    @abstractmethod
    def create_job_definition(self, model: CreateJobDefinition) -> str:
        """Creates a new job definition record,
        consider this as the template for creating
        recurring/scheduled jobs.
        """
        pass

    @abstractmethod
    def update_job_definition(self, model: UpdateJob):
        """Updates job definition metadata in the persistence store,
        should only impact all future jobs.
        """
        pass

    @abstractmethod
    def delete_job_definition(self, job_definition_id: str):
        """Deletes the job definition record,
        implementors can optionally stop
        jobs with this job definition
        """
        pass

    @abstractmethod
    def list_job_definitions(self, query: ListJobDefinitionsQuery) -> List[DescribeJobDefinition]:
        """Returns list of all job definitions filtered by query"""
        pass

    @abstractmethod
    def pause_jobs(self, job_definition_id: str):
        """Pauses all future jobs for a job definition"""
        pass

EnvironmentsManager

Environments provide a mechanism to switch runtime context while submitting a job execution. For example, the default environments manager bundled with this extension provides a list of locally installed conda environments. The user can select one of these environments during job submission, and the backend is expected to run the notebook in the selected conda environment.

def list_environments() -> List[RuntimeEnvironment]:
        pass

EnvironmentsManager is a configurable trait class that can be used to replace the list of environments presented to the user. Note that how the runtime context will change during execution of the notebook job relies on the specific backend implementation.

environment_manager_class = TypeFromClasses(
        default_value=CondaEnvironmentManager,
        klasses=[
            "jupyter_scheduling.environments.EnvironmentManager"
        ],
        config=True,
        help=_i18n("The runtime environment manager class to use.")
    )

Environments manager also provides a second API to provide a command id which, if non-empty, allows the UI to launch the environment management UI. Note that the actual UI to manage environments is outside the scope of the scheduler lab extension; the server extension is merely providing this extension point so other backends can accommodate management of environments for their users. We expect that most users will not find the need to manage environments from within the job creation process, and will mostly select one of the pre-configured named environments.

def manage_environments_command(self) -> str:
    pass

Current features

The current extension supports these features:

  • Multiple backends: The REST API can be retrofitted to work with different backends, for example JupyterHub
  • Run now jobs: Submit a notebook job to run now
  • Job parameters: Enables parameterization of jobs at runtime. This supports both notebook and non-notebook artifacts, for example python files.
  • Stop Job: Allows stopping a job once it has been started
  • Delete Job: Allows deletion of a job, stopping it if started
  • Sort: Multi-field sort job attributes inside list API
  • Filter: Filter jobs list by status, name, start_time, and tags
  • Pagination: Token based pagination of jobs
  • Multiple outputs: HTML and notebook outputs supported

Planned features

  • Scheduled jobs: Will allow submitting jobs with a schedule
  • Multiple files: Will allow selection of multiple files, for example selecting other files needed by the target notebook
  • Offline runs: Handle notebook runs when the JupyterLab instance is shutdown
  • File ID: Integration with File ID service to track document moves
  • Multi-task jobs: enable jobs that consist of multiple steps with dependencies

Lab extension example

headless-notebooks-ui

@codecov-commenter
Copy link

Codecov Report

Merging #957 (cedd624) into main (644540b) will decrease coverage by 38.54%.
The diff coverage is 0.00%.

@@             Coverage Diff             @@
##             main     #957       +/-   ##
===========================================
- Coverage   72.35%   33.80%   -38.55%     
===========================================
  Files          64       75       +11     
  Lines        8131     8845      +714     
  Branches     1355     1457      +102     
===========================================
- Hits         5883     2990     -2893     
- Misses       1838     5672     +3834     
+ Partials      410      183      -227     
Impacted Files Coverage Δ
jupyter_server/services/scheduling/__init__.py 0.00% <0.00%> (ø)
jupyter_server/services/scheduling/config.py 0.00% <0.00%> (ø)
jupyter_server/services/scheduling/environments.py 0.00% <0.00%> (ø)
jupyter_server/services/scheduling/executors.py 0.00% <0.00%> (ø)
jupyter_server/services/scheduling/extension.py 0.00% <0.00%> (ø)
jupyter_server/services/scheduling/handlers.py 0.00% <0.00%> (ø)
jupyter_server/services/scheduling/models.py 0.00% <0.00%> (ø)
jupyter_server/services/scheduling/orm.py 0.00% <0.00%> (ø)
jupyter_server/services/scheduling/parameterize.py 0.00% <0.00%> (ø)
jupyter_server/services/scheduling/scheduler.py 0.00% <0.00%> (ø)
... and 46 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@3coins
Copy link
Contributor Author

3coins commented Sep 1, 2022

There were some questions raised in the server meeting about the extension. Surfacing them here for community members to elaborate and provide suggestions.

  • Switching between multiple backends
    Allowing users to select between backends is currently not supported by the API, but agree that this is a useful feature to add to our roadmap.

  • Flexibility in the CreateJob model for future use cases, for example a job that requires approval
    The guiding principle we started with was a minimal set of inputs for Jupyter users to submit jobs, optimizing on ease of submitting jobs without selecting a bunch of options; we think that most users can pre configure their environments and submit jobs using these named environments without selecting all options during job creation. A mechanism to override the environment attributes is also provided by the API in the runtime_environment_parameters. Given that, it is acceptable to review what additional attributes are needed during job creation to make this work for wider Jupyter community.

@vidartf @kevin-bates @blink1073
You all have some great ideas about making this extension better for Jupyter users. Please feel free to add to this discussion/provide suggestions.

@dlqqq dlqqq mentioned this pull request Sep 1, 2022
@blink1073
Copy link
Contributor

I would suggest adding an API to list the available scheduler backends, where the model contains a JSON schema for how to configure the scheduler.
We had discussed making the scheduler name/id part of the job request model. It could also contain suitable configuration for the scheduler.

@minrk

This comment was marked as resolved.

@minrk
Copy link
Contributor

minrk commented Sep 5, 2022

We will not be merging this code into jupyter-server/jupyter-server, but using this as a mechanism to get feedback from the community

I need to read more carefully. I understand now! Sorry for the already-answered question.

@blink1073
Copy link
Contributor

Closing now that work is taking place in https://github.com/jupyter-server/jupyter-scheduler. Thanks all!

@blink1073 blink1073 closed this Sep 13, 2022
@3coins 3coins deleted the ft/notebook-scheduling branch September 30, 2022 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants