Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to guide on Dask-Nebari config #79

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions docs/docs/how-tos/dask-nebari-config.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
---
id: dask-nebari-config
title: Configure Dask on Nebari
---

# Introduction

In this tutorial we will dive into the `Nebari-Dask` configuration details. Nebari config is essentially
a `yaml` file which is at the heart of all things (most of them) related to configurations.
Our main focus in this tutorial will be the `profiles` & `dask_worker` section of the config file.
Comment on lines +8 to +10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this tutorial we will dive into the `Nebari-Dask` configuration details. Nebari config is essentially
a `yaml` file which is at the heart of all things (most of them) related to configurations.
Our main focus in this tutorial will be the `profiles` & `dask_worker` section of the config file.
In this tutorial we will dive into the configuration requirements for running Daask on Nebari. The Nebari config (`qhub_config.yml`) is at the heart of all things (most of them) related to configurations.
Our main focus in this tutorial will be the `profiles` & `dask_worker` section of the config file.


### Basic infrastructure details
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Basic infrastructure details
## Basic infrastructure details


Before we dive deeper in configuration details, let's understand about how are the core configuration
components.

### Core components:

- Dask-gateway
- dask workers
- Dask scheduler
Comment on lines +14 to +21
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Before we dive deeper in configuration details, let's understand about how are the core configuration
components.
### Core components:
- Dask-gateway
- dask workers
- Dask scheduler
There are three core configuration components necessary for setting up Dask on Nebari.
These are:
- [Dask-gateway](https://gateway.dask.org/): provides a secure, multi-tenant server for managing [Dask](https://dask.org/) clusters
- [Dask workers](https://distributed.dask.org/en/stable/worker.html): compute tasks as directed by the scheduler and store/serve results
- [Dask scheduler](https://docs.dask.org/en/stable/scheduler-overview.html): executes the task graph by coordinating task distribution


### How to configure dask gateway profiles?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### How to configure dask gateway profiles?
## Configuring Dask Gateway profiles


Nebari allows for the configuration of the Dask workers. Similar to the JupyterLab instances, you only need to configure the cores and memory.

When configuring the memory and CPUs for profiles, some important considerations exist. Two essential terms to understand are:

- `limit` is the absolute max memory a given pod can consume. Suppose a process within the pod consumes more than the `limit` memory. In that case, the Linux OS will kill the process. `limit` is not used for scheduling purposes with Kubernetes.

- `guarantee`: is the amount of memory the Kubernetes scheduler uses to place a given pod. In general, the `guarantee` will be less than the limit. Often the node itself has less available memory than the node specification. See this [guide from digital ocean](https://docs.digitalocean.com/products/kubernetes/#allocatable-memory), which generally applies to other clouds.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `guarantee`: is the amount of memory the Kubernetes scheduler uses to place a given pod. In general, the `guarantee` will be less than the limit. Often the node itself has less available memory than the node specification. See this [guide from digital ocean](https://docs.digitalocean.com/products/kubernetes/#allocatable-memory), which generally applies to other clouds.
- `guarantee`: is the amount of memory the Kubernetes scheduler uses to place a given pod. In general, the `guarantee` will be less than the limit. Often the node itself has less available memory than the node specification. You may want to check out this [guide from digital ocean](https://docs.digitalocean.com/products/kubernetes/#allocatable-memory) which also generally applies to other clouds.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following is an example configuration:

```python
jupyterlab:
- display_name: Small Instance
description: Stable environment with 1 cpu / 1 GB ram
access: all
default: true
kubespawner_override:
cpu_limit: 1
cpu_guarantee: 1
mem_limit: 1G
mem_guarantee: 1G
- display_name: Medium Instance
description: Stable environment with 1.5 cpu / 2 GB ram
access: yaml
groups:
- admin
- developers
users:
- bob
kubespawner_override:
cpu_limit: 1.5
cpu_guarantee: 1.25
mem_limit: 2G
mem_guarantee: 2G
- display_name: Large Instance
description: Stable environment with 2 cpu / 4 GB ram
access: keycloak
kubespawner_override:
cpu_limit: 2
cpu_guarantee: 2
mem_limit: 4G
mem_guarantee: 4G
```

### How to configure dask scheduler?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### How to configure dask scheduler?
## Configuring the Dask Scheduler


In a few instances, the dask worker node-group might be running on quite a large instance, perhaps with 8 CPUs and 32 GB of memory (or more). In this case, you might also want to increase the resource levels associated with the dask scheduler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In a few instances, the dask worker node-group might be running on quite a large instance, perhaps with 8 CPUs and 32 GB of memory (or more). In this case, you might also want to increase the resource levels associated with the dask scheduler.
For analyses requiring heavy compute, there may be some situations where the Dask worker node-group might be running on quite a large cluster, perhaps with 8 CPUs and 32 GB of memory (or more). In this case, you might also want to increase the resource levels associated with the Dask scheduler.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The following is an example of a Dask worker configuration with the Scheduler resources modified.

```python
dask_worker:
"Huge Worker":
worker_cores_limit: 7
worker_cores: 6
worker_memory_limit: 30G
worker_memory: 28G
scheduler_cores_limit: 7
scheduler_cores: 6
scheduler_memory_limit: 30G
scheduler_memory: 28G
Comment on lines +72 to +81
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a jupyterlab profile?

```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```

Next Steps

Now that you have your Nebari instance all set up to run Dask, check out our user guide on Working with Big Data using Dask!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just guessing at that link to the page, you should double check that works

1 change: 1 addition & 0 deletions docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ module.exports = {
label: "How-to Guides",
link: { type: "doc", id: "how-tos/index" },
items: [
"how-tos/dask-nebari-config",
"how-tos/kbatch-howto",
"how-tos/nebari-gcp",
"how-tos/nebari-aws",
Expand Down