As part of Azure Machine Learning (AML) service capabilities, Azure Arc-enabled Machine Learning (ML) brings AML to any infrastructure across multi-cloud, on-premises, and the edge using Kubernetes on their hardware of choice. The design for Azure Arc-enabled ML helps IT operators leverage native Kubernetes concepts such as namespace, node selector, and resources requests/limits for ML compute utilization and optimization. By letting the IT operator manage ML compute setup, Azure Arc-enabled ML creates a seamless AML experience for data scientists who do not need to learn or use Kubernetes directly.
This repository is intended to serve as an information hub for customers and partners who are interested in Azure Arc-enabled AML training public preview. Use this repository for onboarding and testing instructions as well as an avenue to provide feedback, issues, enhancement requests and stay up to date as the preview progresses. To deploy a trained model using Azure Arc-enabled Machine Learning, please sign up Inference Preview. Please note that preview release is subject to the Supplemental Terms of Use for Microsoft Azure Previews
- An Azure subscription. If you don't have an Azure subscription, create a free account before you begin.
- You have a Kubernetes cluster up and running - the cluster must have minimum of 4 vCPU cores and 8GB memory, around 2 vCPU cores and 3GB memory would be used by Arc and ML extension components.
- Your Kubernetes cluster is connected to Azure Arc (not a prerequisite for AKS in Azure cloud)
- You've met the pre-requisites listed under the generic cluster extensions documentation.
- Azure CLI version >=2.24.0
- Azure CLI extension k8s-extension version >=0.4.3.
- Create an AML workspace if you don't have one already.
- AML Python SDK version >= 1.30.
Getting started with Training Public Preview is easy with following steps:
- Deploy AzureML extension to your Azure Arc enabled Kubernetes cluster
- Create a compute target - Attach Azure Arc enabled Kubernetes cluster to AML Workspace
- Train image classification model with AML 2.0 CLI
- Train image classification model with Python SDK
As another compute target in AML, Azure Arc-enabled ML preview supports the following built-in AML training features seamlessly:
- Train models with AML 2.0 CLI
- Train models with AML Python SDK
- Build and use ML pipelines including designer pipeline support
- Batch Inference
In addition to above built-in AML training features, public preview also supports following on-premises training scenarios
-
Model deployment for real-time inference. Sign up here to get access to its Github repo.
-
Interactive job to access your training compute using VS Code, Jupyter Notebook, Jupyter Lab, and summarize metrics with Tensorboard. Sign up here to get access to its Github repo.
-
Deploy AzureML extension to your AKS cluster without connecting via Azure Arc. Sign up here to allowlist your subscription.
-
Arc agent and ML extension deployment from on-prem container registry
Azure Arc-enabled Machine Learning is currently supported in these regions where Azure Arc is available:
- East US
- East US 2
- South Central US
- West US 2
- Australia East
- Southeast Asia
- North Europe
- UK South
- West Europe
- West Central US
- Central US
- North Central US
- West US
- Korea Central
- France Central
- Azure Kubernetes Services
- AKS Engine
- AKS on Azure Stack HCI
- GKE (Google Kubernetes Engine)
- Canonical Kubernetes Distribution
- Deploy AzureML extension on OpenShift Container Platform (OCP)
- K3S-Lightweight Kubernetes
- Kubernetes 1.18.x, 1.19.x and 1.20.x
New features are released at a biweekly cadance.
July 2, 2021 Release
- New Kubernetes distributions support, OpenShift Kubernetes and GKE (Google Kubernetes Engine).
- Autoscale support. If the user-managed Kubernetes cluster enables the autoscale, the cluster will be automatically scaled out or scaled in according to the volume of active runs and deployments.
- Performance improvement on job laucher, which shortens the job execution time to a great deal.
August 10, 2021 Release
- New Kubernetes distribution support, K3S - Lightweight Kubernetes.
- Deploy AzureML extension to your AKS cluster without connecting via Azure Arc.
- Automated Machine Learning (AutoML) via Python SDK
- Use 2.0 CLI to attach the Kubernetes cluster to AML Workspace
- Optimize AzureML extension components CPU/memory resources utilization.
August 24, 2021 Release
Sept 16, 2021 Release
- New regions available, WestUS, CentralUS, NorthCentralUS, KoreaCentral.
- Job queue explanability. See job queue details in AML Workspace Studio.
- Auto-killing policy. Support
max_run_duration_seconds
inScriptRunConfig
. The system will attempt to automatically cancel the run if it took longer than the setting value. - Performance improvement on cluster autoscale support.
- Arc agent and ML extension deployment from on-prem container registry
Oct 14, 2021 Release
We are always looking for feedback on our current experiences and what we should work on next. If there is anything you would like us to prioritize, please feel free to suggest so via our GitHub Issue Tracker. You can submit a bug report, a feature suggestion or participate in discussions.
Or reach out to us: [email protected] if you have any questions or feedback.
- Check for known issues
- View general feedback
- Browse roadmap items
- Open a bug, provide feedback, or suggest an improvement
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
The lifecycle management (health, kubernetes version upgrades, security updates to nodes, scaling, etc.) of the AKS or Arc Kubernetes cluster is the responsibility of the customer.
For AKS, read what is managed and what is shared responsibility here
All preview features are available on a self-service, opt-in basis and are subject to breaking design and API changes. Previews are provided "as is" and "as available," and they're excluded from the service-level agreements and limited warranty. As such, these features aren't meant for production use.
Azure Arc-enabled ML supports targeting ML training on both Azure Kubernetes Service (AKS) clusters or any cluster that is registered in Azure using Arc.
Kubernetes version support is in accordance with what AKS supports, see here for details.