Skip to content

0.18.7

Compare
Choose a tag to compare
@peterschmidt85 peterschmidt85 released this 29 Jul 12:54
· 320 commits to master since this release
02decfe

Fleets

With fleets, you can now describe clusters declaratively and create them in both cloud and on-prem with a single command. Once a fleet is created, it can be used with dev environments, tasks, and services.

Cloud fleets

To provision a fleet in the cloud, specify the required resources, number of nodes, and other optional parameters.

type: fleet
name: my-fleet
placement: cluster
nodes: 2
resources:
  gpu: 24GB

On-prem fleets

To create a fleet from on-prem servers, specify their hosts along with the user, port, and SSH key for connection via SSH.

type: fleet
name: my-fleet
placement: cluster
ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 3.255.177.51
    - 3.255.177.52

To create or update the fleet, simply call the dstack apply command:

dstack apply -f examples/fleets/my-fleet.dstack.yml

Learn more about fleets in the documentation.

Deprecating dstack run

Now that we support dstack apply for gateways, volumes, and fleets, we have extended this support to dev environments, tasks, and services. Instead of using dstack run WORKING_DIR -f CONFIG_FILE, you can now use dstack apply -f CONFIG_FILE.

Also, it's now possible to specify a name for dev environments, tasks, and services, just like for gateways, volumes, and fleets.

type: dev-environment
name: my-ide

python: "3.11"

ide: vscode

resources:
  gpu: 80GB

This name is used as a run name and is more convenient than a random name. However, if you don't specify a name, dstack will assign a random name as before.

RunPod Volumes

In other news, we've added support for volumes in the runpod backend. Previously, they were only supported in the aws backend.

type: volume
name: my-new-volume

backend: runpod
region: ca-mtl-3
size: 100GB

A great feature of the runpod's volumes is their ability to attach to multiple instances simultaneously. This allows for persisting cache across multiple service replicas or supporting distributed training tasks.

Major bugfixes

Important

This update fixes the broken kubernetes backend, which has been non-functional since a few previous updates.

Other

New contributors

** Full changelog**: 0.18.6...0.18.7