You can also attach Arc cluster and create KubernetesCompute target easily via Azure ML 2.0 CLI.
-
Refer to Install, set up, and use the 2.0 CLI (preview) to install ML 2.0 CLI. Compute attach support requires ml extension >= 2.0.1a4.
-
Attach the Arc-enabled Kubernetes cluster,
az ml compute attach --resource-group
--workspace-name
--name
--resource-id
--type
[--file]
[--no-wait]
Required Parameters
-
--resource-group -g
Name of resource group. You can configure the default group using
az configure --defaults group=<name>
. -
--workspace-name -w
Name of the Azure ML workspace. You can configure the default group using
az configure --defaults workspace=<name>
. -
--name -n
Name of the compute target.
-
--resource-id
The fully qualified ID of the resource, including the resource name and resource type.
-
--type -t
The type of compute target. Allowed values: kubernetes, AKS, virtualmachine. Specify
kubernetes
to attach arc-enabled kubernetes cluster.
Optional Parameters
-
--file
Local path to the YAML file containing the compute specification. Ignoring this param will allow the default compute configuration for simple compute attach scenario, or specify a YAML file with customized compute defination for advanced attach scenario.
-
--no-wait
Do not wait for the long-running operation to finish.
AzureML Kubernetes compute target allows user to specify an attach configuration file for some advanced compute target capabilities. Following is a full example of attach configuration YAML file:
default_instance_type: gpu_instance
namespace: amlarc-testing
instance_types:
- name: gpu_instance
node_selector:
accelerator: nvidia-tesla-k80
resources:
requests:
cpu: 1
memory: 4Gi
"nvidia.com/gpu": 1
limits:
cpu: 1
memory: 4Gi
"nvidia.com/gpu": 1
- name: big_cpu_sku
node_selector: null
resources:
requests:
cpu: 4
memory: 16Gi
"nvidia.com/gpu": 0
limits:
cpu: 4
memory: 16Gi
"nvidia.com/gpu": 0
The attach configuration YAML file allows user to specify 3 kind of custom properties for a compute target:
-
namespace
- Default todefault
namespace if this is not specified. This is the namespace where all training job will use and pods will run under this namespace. Note the namespace specified in compute target must preexist and it is usually created with Cluster Admin privilege. -
defaultInstanceType
- You must specify adefaultInstanceType
if you specifyinstanceTypes
property, and the value ofdefaultInstanceType
must be one of values frominstanceTypes
property. -
instanceTypes
- This is the list of instance_types to be used for running training job. Each instance_type is defined bynodeSelector
andresources requests/limits
properties:-
nodeSelector
- one or more node labels. Cluster Admin privilege is needed to create labels for cluster nodes. If this is specified, training job will be scheduled to run on nodes with the specified node labels. You can usenodeSelector
to target a subset of nodes for training workload placement. This can be very handy if a cluster has different SKUs, or different type of nodes such as CPU or GPU nodes, and you want to target certain node pool for training workload. For examples, you could create node labels for all GPU nodes and define an instanceType for GPU node pool, in this way you will be able to submit training job to that GPU node pool. -
Resources requests/limits
-Resources requests/limits
specifies resources requests and limits a training job pod to run.
-
Note: User can specify compute target and instance type in job submision. If instance type is not specified,
defaultInstanceType
will be used.
Note: For simple compute attach without specifying compute configuration file, AzureML will use following configuration for training job. To ensure successful job run completion, we recommend to always specify resources requests/limits according to training job needs.
default_instance_type: defaultInstanceType
namespace: default
instance_types:
- name: defaultInstanceType
node_selector: null
resources:
requests:
cpu: 1
memory: 4Gi
"nvidia.com/gpu": 0
limits:
cpu: 1
memory: 4Gi
"nvidia.com/gpu": 0
It is easy to attach Azure Arc-enabled Kubernetes cluster to AML workspace, you can do so from AML Studio UI portal.
-
Go to AML studio portal, Compute > Attached compute, click "+New" button, and select "Kubernetes (Preview)"
-
Enter a compute name, and select your Azure Arc-enabled Kubernetes cluster from Azure Arc-enabled Kubernetes cluster dropdown list.
-
(Optional) Browse and upload an attach config file. Skip this step to use the default compute configuration for simple compute attach scenario, or specify a YAML file with customized compute defination for advanced attach scenario
-
Click 'Attach' button. You will see the 'provisioning state' as 'Creating'. If it succeeds, you will see a 'Succeeded' state or else 'Failed' state.
You can also attach Arc cluster and create KubernetesCompute target easily via AML Python SDK 1.30 or above.
Following Python code snippets shows how you can easily attach an Arc cluster and create a compute target to be used for training job.
from azureml.core.compute import KubernetesCompute
from azureml.core.compute import ComputeTarget
import os
ws = Workspace.from_config()
# choose a name for your Azure Arc-enabled Kubernetes compute
amlarc_compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "amlarc-ml")
# resource ID for your Azure Arc-enabled Kubernetes cluster
resource_id = "/subscriptions/123/resourceGroups/rg/providers/Microsoft.Kubernetes/connectedClusters/amlarc-cluster"
if amlarc_compute_name in ws.compute_targets:
amlarc_compute = ws.compute_targets[amlarc_compute_name]
if amlarc_compute and type(amlarc_compute) is KubernetesCompute:
print("found compute target: " + amlarc_compute_name)
else:
print("creating new compute target...")
amlarc_attach_configuration = KubernetesCompute.attach_configuration(resource_id)
amlarc_compute = ComputeTarget.attach(ws, amlarc_compute_name, amlarc_attach_configuration)
amlarc_compute.wait_for_completion(show_output=True)
# For a more detailed view of current KubernetesCompute status, use get_status()
print(amlarc_compute.get_status().serialize())
You can also create a compute target with a list of instanceTypes, including custom properties like namespace, nodeSelector, or resources requests/limits. Following Python code snippet shows how to accomplish this.
from azureml.core.compute import KubernetesCompute
from azureml.core.compute import ComputeTarget
import os
ws = Workspace.from_config()
# choose a name for your Azure Arc-enabled Kubernetes compute
amlarc_compute_name = os.environ.get("AML_COMPUTE_CLUSTER_NAME", "amlarc-ml")
# resource ID for your Azure Arc-enabled Kubernetes cluster
resource_id = "/subscriptions/123/resourceGroups/rg/providers/Microsoft.Kubernetes/connectedClusters/amlarc-cluster"
if amlarc_compute_name in ws.compute_targets:
amlarc_compute = ws.compute_targets[amlarc_compute_name]
if amlarc_compute and type(amlarc_compute) is KubernetesCompute:
print("found compute target: " + amlarc_compute_name)
else:
print("creating new compute target...")
ns = "amlarc-testing"
instance_types = {
"gpu_instance": {
"nodeSelector": {
"accelerator": "nvidia-tesla-k80"
},
"resources": {
"requests": {
"cpu": "2",
"memory": "16Gi",
"nvidia.com/gpu": "1"
},
"limits": {
"cpu": "2",
"memory": "16Gi",
"nvidia.com/gpu": "1"
}
}
},
"big_cpu_sku": {
"nodeSelector": {
"VMSizes": "VM-64vCPU-256GB"
}
}
}
amlarc_attach_configuration = KubernetesCompute.attach_configuration(resource_id = resource_id, namespace = ns, default_instance_type="gpu_instance", instance_types = instance_types)
amlarc_compute = ComputeTarget.attach(ws, amlarc_compute_name, amlarc_attach_configuration)
amlarc_compute.wait_for_completion(show_output=True)
# For a more detailed view of current KubernetesCompute status, use get_status()
print(amlarc_compute.get_status().serialize())