Skip to content

LSP Evaluation

Zarquan edited this page Jul 3, 2020 · 2 revisions

Evaluation of the LSST Science Platform (including the DASK panel)

See issue #131

The LSST Science Platform consists of the following components:

Landing page A simple web page to help users navigate to the different aspects of the Science Platform

Nublado A customized JupyterHub system to allow users to run notebooks next to the data

Firefly An instance of the IPAC portal for in-browser data visualization

CADC’s TAP service An IVOA TAP service for QServ

Fileserver An NFS fileserver for persistent storage of notebooks and user data


User Experience

The following describes the user experience when using the LSP, with a focus on the Nublado components, which is related to the service we are building.

A user first reaches the landing page, which is a welcome page and has links to both the Notebook component & the Portal component.

The portal in this case is Firefly, a user interface that provides a UI for querying TAP services and visualizing the results. This won't be covered in detail here as it is not relevant, but a tutorial on it's use can be found here: https://github.com/lsst-uk/lsp-uk/wiki/LSP-Firefly-UI-Tutorial

Once the user clicks the notebook link, they are redirected to a JupyterHub login page, where there is an option to "Sign in with Github".

  • If the users is logged in to their Github account, they are asked to authorize a Github OAuth application, to allow JupyterHub to log them in using this application.
  • If they are not logged in, they are redirected to a Github login page, after which the above take place.
  • If they have previously logged in, the service will forward them to the next page in the process directly.

In the initial prototype any Github user can access the service, in the future this will not be as open, and will be limited to registered users.

In the next stage of the process, the user is provided with a list of Images, and Sizes to select from for the Notebook server.

Examples of the sizes of the current LSP prototype are:

  • Tiny (0.50 CPU, 1024M RAM)
  • Small (1.00 CPU, 2048M RAM)
  • Medium (2.00 CPU, 4096M RAM)
  • Large (4.00 CPU, 8192M RAM)

Once selected they click the Start button and are taken to the JupyterLab environment after a few seconds where the image is fetched if not available, and deployed as a Pod on the Kuberenetes cluster. Each user notebook container is a separate Pod so that they are not competing or resources. Additionally, if the user is logging in for the first time, we create an NFS folder using their Github username, where notebooks will be peristently stored.

In the JupyterLab environment, a user selects their interpreter between Python / LSST (Python with additional libraries) to start a notebook, or they can select from a set of existing Notebook examples.

The JupyterLab environment also provides a few additional features, such as a file explorer where they can upload files and move notebooks between folders.

One of the features that is currently being evaluated is the Dask tab. Here a user can create a new small Dask cluster by simply clicking a "new" button. The default Dask cluster size is 2 worker nodes, 8GB RAM, with 1 core per node. There is also a "Scale" option that allows them to either "Manually" scale the cluster by adding additional workers, or to use "Adaptive Scaling", where they are asked to set a minimum & maximum amount of workers.

When started, a user can run use the dask python libary to connect to the local cluster and run parallel processing jobs. There are also a number of pages for monitoring the cluster for memory, CPU, bandwidth usage, and graphs and statistics for general resource usage and availability of the workers.

From my understanding so far, it looks like the Dask cluster is created as part of the JupyterHub Pod, as we can see the CPU & Memory Usage go up as soon as we create our clusters.

For a tutorial of what this looks like see here: https://www.youtube.com/watch?v=EX_voquHdk0

As we are currently early in the development and deployment of our own instance of the LSP, we are not yet able to say how the system behaves with concurrent use by multiple users, and what happens in the case that resources are exceeded.