Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/covalent #196

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions .github/workflows/cloud-ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
name: tests

on:
# Runs for pull requests
pull_request:
branches:
- master

permissions:
id-token: write

jobs:
cloud-tests:
strategy:
fail-fast: true
matrix:
include:
- arch: cuda
exclude: "no-cuda"
run_on: azure__a100
# - arch: rocm
# exclude : "no-rocm"

runs-on: ubuntu-latest
environment: cloud-ci

# Cancel previous jobs if a new version was pushed
concurrency:
group: "${{ github.ref }}-${{ matrix.arch }}-${{ matrix.run_on }}"
cancel-in-progress: true

defaults:
run:
shell: bash -el {0}

env:
MILABENCH_CONFIG: "config/standard.yaml"
MILABENCH_SYSTEM: "config/cloud-system.yaml"
MILABENCH_BASE: "output"
MILABENCH_ARGS: ""
MILABENCH_GPU_ARCH: "${{ matrix.arch }}"
MILABENCH_DASH: "no"
ARM_TENANT_ID: "${{ secrets.ARM_TENANT_ID }}"
ARM_SUBSCRIPTION_ID: "${{ secrets.ARM_SUBSCRIPTION_ID }}"
AZURE_CORE_OUTPUT: none

steps:
- uses: actions/checkout@v3
with:
token: ${{ github.token }}

- uses: actions/setup-python@v2
with:
python-version: 3.9

# Follow
# https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/guides/service_principal_client_secret
# to generate a clientId as well as a clientSecret
- name: Azure login
uses: azure/login@v2
with:
creds: |
{
"clientId": "${{ secrets.ARM_CLIENT_ID }}",
"clientSecret": "${{ secrets.ARM_CLIENT_SECRET }}",
"subscriptionId": "${{ secrets.ARM_SUBSCRIPTION_ID }}",
"tenantId": "${{ secrets.ARM_TENANT_ID }}"
}

- name: dependencies
run: |
python -m pip install -U pip
python -m pip install -U poetry
poetry lock --no-update
poetry install

- name: setup cloud credentials
run: |
mkdir -p ~/.aws
mkdir -p ~/.ssh/covalent
echo "${{ secrets.COVALENT_EC2_EXECUTOR_KEYPAIR }}" >~/.ssh/covalent/covalent-ec2-executor-keypair.pem
echo "[default]" >~/.aws/credentials
echo "aws_access_key_id=${{ secrets.AWS_ACCESS_KEY_ID }}" >>~/.aws/credentials
echo "aws_secret_access_key=${{ secrets.AWS_SECRET_ACCESS_KEY }}" >>~/.aws/credentials
chmod -R a-rwx,u+rwX ~/.aws ~/.ssh

- name: setup cloud
run: |
_system=$(
poetry run milabench cloud \
--setup \
--run-on ${{ matrix.run_on }}
)
{ read _hash ; }< <(
echo -n "$_system" | while read l
do
if [[ "$l" == "# hash::>"* ]]
then
echo -n "${l#*::>}"
fi
done
echo
)
if [[ -z "${_hash}" ]]
then
>&2 echo "Failed to fetch system config hash"
exit 1
fi
echo -n "$_system" >$MILABENCH_SYSTEM.$_hash
echo "MILABENCH_SYSTEM=$MILABENCH_SYSTEM.$_hash" >>$GITHUB_ENV

- name: install benchmarks
run: |
poetry run milabench install --variant ${{ matrix.arch }}

- name: prepare benchmarks
run: |
poetry run milabench prepare

- name: run benchmarks
run: |
poetry run milabench run

- name: Summary
run: |
# git remote set-url origin "https://${{ vars.REPORTS_USERNAME }}:${{ secrets.REPORTS_PAT }}@$(git remote get-url origin | cut -d'/' -f3-)"
git config --global user.email "[email protected]"
git config --global user.name "GitHub CI"
poetry run milabench report --push

- name: teardown cloud
if: always()
run: |
if [[ -f "${MILABENCH_SYSTEM%.*}" ]]
then
export MILABENCH_SYSTEM=${MILABENCH_SYSTEM%.*}
fi
poetry run milabench cloud \
--teardown \
--run-on ${{ matrix.run_on }} \
--all
46 changes: 46 additions & 0 deletions benchmarks/_template/requirements.cpu.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#
# This file is autogenerated by pip-compile with Python 3.10
# by the following command:
#
# pip-compile --output-file=benchmarks/_template/requirements.cpu.txt benchmarks/_template/requirements.in
#
antlr4-python3-runtime==4.9.3
# via omegaconf
asttokens==2.4.1
# via giving
codefind==0.1.3
# via ptera
executing==1.2.0
# via varname
giving==0.4.2
# via
# ptera
# voir
markdown-it-py==3.0.0
# via rich
mdurl==0.1.2
# via markdown-it-py
omegaconf==2.3.0
# via voir
ovld==0.3.2
# via voir
ptera==1.4.1
# via voir
pygments==2.17.2
# via rich
pynvml==11.5.0
# via voir
pyyaml==6.0.1
# via omegaconf
reactivex==4.0.4
# via giving
rich==13.7.0
# via voir
six==1.16.0
# via asttokens
typing-extensions==4.10.0
# via reactivex
varname==0.10.0
# via giving
voir==0.2.12
# via -r benchmarks/_template/requirements.in
31 changes: 31 additions & 0 deletions config/cloud-multinodes-system.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
system:
# Nodes list
nodes:
# Alias used to reference the node
- name: manager
# Use 1.1.1.1 as an ip placeholder
ip: 1.1.1.1
# Use this node as the master node or not
main: true
# User to use in remote milabench operations
user: user

- name: node1
ip: 1.1.1.1
main: false
user: username

# Cloud instances profiles
cloud_profiles:
azure__a100:
username: ubuntu
size: Standard_NC24ads_A100_v4
location: eastus2
azure__a100_x2:
username: ubuntu
size: Standard_NC48ads_A100_v4
location: eastus2
azure__a10_x2:
username: ubuntu
size: Standard_NV72ads_A10_v5
location: eastus2
26 changes: 26 additions & 0 deletions config/cloud-system.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
system:
# Nodes list
nodes:
# Alias used to reference the node
- name: manager
# Use 1.1.1.1 as an ip placeholder
ip: 1.1.1.1
# Use this node as the master node or not
main: true
# User to use in remote milabench operations
user: user

# Cloud instances profiles
cloud_profiles:
azure__a100:
username: ubuntu
size: Standard_NC24ads_A100_v4
location: eastus2
azure__a100_x2:
username: ubuntu
size: Standard_NC48ads_A100_v4
location: eastus2
azure__a10_x2:
username: ubuntu
size: Standard_NV72ads_A10_v5
location: eastus2
37 changes: 37 additions & 0 deletions config/examples/cloud-multinodes-system.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
system:
# Nodes list
nodes:
# Alias used to reference the node
- name: manager
# Use 1.1.1.1 as an ip placeholder
ip: 1.1.1.1
# Use this node as the master node or not
main: true
# User to use in remote milabench operations
user: user

- name: node1
ip: 1.1.1.1
main: false
user: username

# Cloud instances profiles
cloud_profiles:
# The cloud platform to use in the form of {PLATFORM} or
# {PLATFORM}__{PROFILE_NAME}
azure:
# covalent-azure-plugin args
username: ubuntu
size: Standard_B1s
location: eastus2
azure__free:
username: ubuntu
size: Standard_B2ats_v2
location: eastus2
ec2:
# covalent-ec2-plugin args
username: ubuntu
instance_type: t2.micro
volume_size: 8
region: us-east-2
state_id: 71669879043a3864225aabb94f91a2d4
30 changes: 30 additions & 0 deletions config/examples/cloud-system.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
system:
# Nodes list
nodes:
# Alias used to reference the node
- name: manager
# Use 1.1.1.1 as an ip placeholder
ip: 1.1.1.1
# Use this node as the master node or not
main: true
# User to use in remote milabench operations
user: user

# Cloud instances profiles
cloud_profiles:
# The cloud platform to use in the form of {PLATFORM}__{PROFILE_NAME}
azure:
# covalent-azure-plugin args
username: ubuntu
size: Standard_B1s
location: eastus2
azure__free:
username: ubuntu
size: Standard_B2ats_v2
location: eastus2
ec2:
# covalent-ec2-plugin args
username: ubuntu
instance_type: t2.micro
volume_size: 8
region: us-east-2
24 changes: 24 additions & 0 deletions config/examples/test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
_defaults:
max_duration: 600
voir:
options:
stop: 60
interval: "1s"

test:
inherits: _defaults
group: test_remote
install_group: test_remote
definition: ../../benchmarks/_template
plan:
method: njobs
n: 1

testing:
inherits: _defaults
definition: ../../benchmarks/_template
group: test_remote_2
install_group: test_remote_2
plan:
method: njobs
n: 1
13 changes: 13 additions & 0 deletions docs/dev-usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,16 @@ milabench compare
~~~~~~~~~~~~~~~~~

TODO.

Using milabench on the cloud
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Milabench uses `Terraform <https://developer.hashicorp.com/terraform>`_ through
`Covalent <https://docs.covalent.xyz/>`_. To add support for a new cloud
platform you will need to develop a new clovalent plugin with it's Terraform
config. An example is the
`covalent-azure-plugin <https://github.com/satyaog/covalent-azure-plugin/tree/feature/milabench>`_.
The interesting parts would be:

* `Terraform provider's related plugin arguments <https://github.com/satyaog/covalent-azure-plugin/blob/feature/milabench/covalent_azure_plugin/azure.py>`_
* `Terraform provider's configuration <https://github.com/satyaog/covalent-azure-plugin/blob/feature/milabench/covalent_azure_plugin/infra/main.tf>`_
Loading
Loading