Magic Clusters

Magic castle recipes for various workshops

Structure of this repository

The directory common contains most of the Magic Castle configuration in the form of a common.tf file which defines the typical openstack and dns modules. It however defines it in a way that is customizable through Terraform Locals. The directory also contains a config.yaml file which contains useful defaults that don't need to be redefined in specific workshop directories. Finally, it contains a sshkeys.pub file which contains public SSH keys which will be injected into the resulting clusters.

The directory example contains an example of a cluster in which we would redefine many of the parameters of a cluster.

Other directories using a course code are for specific workshops. In each of these directories, there must be a symbolic link to ../common/common.tf, which will include the configuration as define in that file. Make sure that this is a symbolic link, and not a copy, or your configuration will diverge.

Terraform parameters

The main.tf file that you create for the course should have the following structure:

locals {
  name = "course_code"

  custom ={
    # parameters which you want to customize
  }
}

Most frequent locals to customize

In most cases, the only terraform locals which you want to customize will be

Parameter	Description	Default value
`nnode_cpu`	Number of static CPU nodes in the cluster	2
`nnode_cpupool`	Maximum number of CPU nodes that can be booted dynamically by Slurm	0
`nnode_gpu`	Number of static GPU nodes in the cluster	0
`nnode_gpupool`	Maximum number of GPU nodes that can be booted dynamically by Slurm	0
`home_size`	Size of the /home filesystem in GB	20
`project_size`	Size of the /project filesystem in GB	20
`scratch_size`	Size of the /scratch filesystem in GB	20

A typical configuration might therefore be as simple as:

locals {
  name = "example"
  
  custom = {
    nnode_cpu = 5
    nnode_gpupool = 2
    home_size = 50
  }
}

Other locals that can be customized

In some rare cases, you may want to customize the following local variables:

Parameter	Description	Default value
`image`	Default image to use for login and management VMs	Rocky-8
`image_cpu`	Default image to use for CPU node VMs	snapshot-cpunode-2024-R810.4
`image_gpu`	Default image to use for CPU node VMs	snapshot-gpunode-2024-R810.4
`nnode_compute`	Number of CPU nodes of a second type	0
`config_version`	Version of puppet-magic_castle to use	see `common.tf`
`gpu_mig_config`	MIG configuration used for static GPU nodes (for Béluga-cloud only)	`{ "3g.20gb" = 1, "2g.10gb" = 1, "1g.5gb" = 2 }`
`gpupool_mig_config`	MIG configuration used for static GPU nodes (for Béluga-cloud only)	`{ "1g.5gb" = 7 }`

The common.tf file also defines a map of flavours of virtual machines to be used, for Arbutus and Béluga. This map can be completely or partially redefined. Default values are

instances_type_map = {
  arbutus = {
    mgmt = "p8-12gb"
    login = "p4-6gb"
	cpu = "c8-30gb-186-avx2"
	cpupool = "c8-30gb-186-avx2"
	compute_node = "p8-12gb"
	gpu = "g1-8gb-c4-22gb"
	gpupool = "g1-8gb-c4-22gb"
  }
  beluga = {
    mgmt = "p4-7.5gb"
	login = "p4-7.5gb"
	cpu = "c8-60gb"
	cpupool = "c8-60gb"
	compute_node = "p8-15gb"
	gpu = "gpu32-240-3450gb-a100x1"
	gpupool = "gpu32-240-3450gb-a100x1"
  }
}

For example, if you simply want to change the type of CPU node for Béluga, you could define:

locals {
  ...
  custom = {
    instances_type_map = {
      beluga = {
        cpu = "c4-30gb"
      }
    }
  }
}

If the above customizations are not enough, you can also completely redefine the instances or the volumes variable. See here for an example.

Puppet YAML configuration

Any parameter specified in the config.yaml file inside of the course directory will override the default value that may be defined in the common directory. For a sampling of some useful puppet parameters, see in the example folder. For an exhaustive list, see the Magic Castle documentation.

Terraform variables

If you use a dynamic pool of node, you will want to configure a Terraform variable of type HCL, named pool, with value [].

The name of the cluster will be based on the name variable as defined in the locals. If you want, you can define a Terraform variable named suffix, which will be appended to the name. This may be useful if you want to start two clusters, one in Arbutus, one in Béluga-cloud, using the same configuration.

YAML Validation

Local (client-side) pre-commit hook

One can validate and lint the YAML configuration files locally by using yamllint and the provided configuration and hook. The hook is ran before a commit operation and checks files that are staged for commit.

Install yamllint locally with your preferred method (pip, system package manager, ...). For installation instructions, see quickstart
Copy the hook and ensure it is executable

cd $(git rev-parse --show-toplevel)
cp -v .templates/hooks/pre-commit .git/hooks/

or create a symbolic link:

cd $(git rev-parse --show-toplevel)
ln -rs .templates/hooks/pre-commit .git/hooks/pre-commit

Online

A Github action runs on every push or pull request that modifies a yaml file. One can see the different runs and details for a particular commit or under the Actions tab.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Magic Clusters

Structure of this repository

Terraform parameters

Most frequent locals to customize

Other locals that can be customized

Puppet YAML configuration

Terraform variables

YAML Validation

Local (client-side) pre-commit hook

Online

Files

README.md

Latest commit

History

README.md

File metadata and controls

Magic Clusters

Structure of this repository

Terraform parameters

Most frequent locals to customize

Other locals that can be customized

Puppet YAML configuration

Terraform variables

YAML Validation

Local (client-side) pre-commit hook

Online