GitHub - emattia/metaflow-gpu-cfn

Metaflow GPU Cloudformation

A modified version of the blessed Metaflow Cloudformation template that enables a GPU queue by default.

There are three changes to pay attention to as you set up your cluster.

1. Choosing instance type

This section in the Parameters:

  GPUComputeEnvInstanceTypes:
    Type: CommaDelimitedList
    Default: "p3.2xlarge,p3.8xlarge,p3.16xlarge"
    Description: "The instance types for the p3 GPU compute environment as a comma-separated list"

2. Creating a second compute environment

This section in the Resources:

  ComputeEnvironmentGPU:
    Type: AWS::Batch::ComputeEnvironment
    Properties:
      Type: MANAGED
      ServiceRole: !GetAtt 'BatchExecutionRole.Arn'
      ComputeResources:
        MaxvCpus: !Ref MaxVCPUBatchGPUEnv
        SecurityGroupIds:
          - !GetAtt VPC.DefaultSecurityGroup
        Type: !If [EnableFargateOnBatch, 'FARGATE', 'EC2']
        Subnets:
          - !Ref Subnet1
          - !Ref Subnet2
        MinvCpus: !If [EnableFargateOnBatch, !Ref AWS::NoValue, !Ref MinVCPUBatch]
        InstanceRole: !If [EnableFargateOnBatch, !Ref AWS::NoValue, !GetAtt 'ECSInstanceProfile.Arn']
        InstanceTypes: !If [EnableFargateOnBatch, !Ref AWS::NoValue, !Ref GPUComputeEnvInstanceTypes]
        DesiredvCpus: !If [EnableFargateOnBatch, !Ref AWS::NoValue, !Ref DesiredVCPUBatch]
      State: ENABLED

3. Creating a second job queue

The JobQueueGPU section in the Resources:

  JobQueueGPU:
    DependsOn: ComputeEnvironmentGPU
    Type: AWS::Batch::JobQueue
    Properties:
      ComputeEnvironmentOrder:
        - Order: 1
          ComputeEnvironment: !Ref ComputeEnvironmentGPU
      State: ENABLED
      Priority: 1
      JobQueueName: !Join [ '-', [ 'job-queue-gpu', !Ref AWS::StackName ] ]

Create the Cloudformation deployment

Go to AWS Console, and upload the cfn.yaml file (it is too big for char count using AWS CLI).

Testing the Metaflow deployment

Set up virtual environemnt

pip install metaflow boto3

Configure Metaflow

You can either do this manually by setting up the Metaflow config file using the Outputs tab in Cloudformation console, or run this script (assumes your AWS credentials are set).

STACK_NAME=metaflow
python mf_configure.py -s $STACK_NAME

Run a flow in your CPU queue

This will run in the job-queue-metaflow queue (unless you changed the name).

python flow.py run

Run a flow in your GPU queue, verifying the hardware is working

Note: By default, a fresh AWS account that I created in April 2024 had 0 on-demand P instance type quotas. In order to run this code, go in AWS Console to Service Quotas/AWS services/Amazon Elastic Compute Cloud (Amazon EC2)/Running On-Demand P instances.

This will run in the job-queue-gpu-metaflow queue (look at the start step @batch decorator).

python gpu_flow.py run

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metaflow GPU Cloudformation

1. Choosing instance type

2. Creating a second compute environment

3. Creating a second job queue

Create the Cloudformation deployment

Testing the Metaflow deployment

Set up virtual environemnt

Configure Metaflow

Run a flow in your CPU queue

Run a flow in your GPU queue, verifying the hardware is working

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
cfn.yaml		cfn.yaml
flow.py		flow.py
gpu_flow.py		gpu_flow.py
mf_configure.py		mf_configure.py

emattia/metaflow-gpu-cfn

Folders and files

Latest commit

History

Repository files navigation

Metaflow GPU Cloudformation

1. Choosing instance type

2. Creating a second compute environment

3. Creating a second job queue

Create the Cloudformation deployment

Testing the Metaflow deployment

Set up virtual environemnt

Configure Metaflow

Run a flow in your CPU queue

Run a flow in your GPU queue, verifying the hardware is working

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages