From 8f751058a0e2c677b7f126b9077ddf7427a721c2 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Fri, 29 Apr 2022 16:12:56 -0700
Subject: [PATCH 01/30] RFC 431: SageMaker Model Hosting L2 Constructs

Co-authored-by: Matt McClean <mmcclean@amazon.com>
Co-authored-by: Long Yao <yl1984108@gmail.com>
Co-authored-by: Drew Jetter <60628154+jetterdj@users.noreply.github.com>
Co-authored-by: Murali Ganesh <59461079+foxpro24@users.noreply.github.com>
Co-authored-by: Abilash Rangoju <988529+rangoju@users.noreply.github.com>
---
 text/0431-sagemaker-l2-endpoint.md | 1150 ++++++++++++++++++++++++++++
 1 file changed, 1150 insertions(+)
 create mode 100644 text/0431-sagemaker-l2-endpoint.md

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
new file mode 100644
index 000000000..d2a874ca0
--- /dev/null
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -0,0 +1,1150 @@
+SageMaker Model Hosting L2 Constructs
+
+* **Original Author(s):**: @pvanlund
+* **Tracking Issue**: #431
+* **API Bar Raiser**: *TBD*
+
+This feature supports the creation of Amazon SageMaker real-time inference hosted endpoints using a
+new set of L2 constructs for the `Endpoint`, `EndpointConfig`, and `Model` CloudFormation resources.
+
+## Working Backwards
+
+### CHANGELOG
+
+`feat(sagemaker): add model hosting L2 constructs`
+
+### README
+
+---
+
+# Amazon SageMaker Construct Library
+<!--BEGIN STABILITY BANNER-->
+
+---
+
+![cfn-resources: Stable](https://img.shields.io/badge/cfn--resources-stable-success.svg?style=for-the-badge)
+
+> All classes with the `Cfn` prefix in this module ([CFN Resources]) are always stable and safe to use.
+>
+> [CFN Resources]: https://docs.aws.amazon.com/cdk/latest/guide/constructs.html#constructs_lib
+
+![cdk-constructs: Experimental](https://img.shields.io/badge/cdk--constructs-experimental-important.svg?style=for-the-badge)
+
+> The APIs of higher level constructs in this module are experimental and under active development.
+> They are subject to non-backward compatible changes or removal in any future version. These are
+> not subject to the [Semantic Versioning](https://semver.org/) model and breaking changes will be
+> announced in the release notes. This means that while you may use them, you may need to update
+> your source code when upgrading to a newer version of this package.
+
+---
+
+<!--END STABILITY BANNER-->
+
+Amazon SageMaker provides every developer and data scientist with the ability to build, train, and
+deploy machine learning models quickly. Amazon SageMaker is a fully-managed service that covers the
+entire machine learning workflow to label and prepare your data, choose an algorithm, train the
+model, tune and optimize it for deployment, make predictions, and take action. Your models get to
+production faster with much less effort and lower cost.
+
+## Installation
+
+Install the module:
+
+```console
+$ npm i @aws-cdk/aws-sagemaker
+```
+
+Import it into your code:
+
+```typescript
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+```
+
+## Model
+
+By creating a model, you tell Amazon SageMaker where it can find the model components. This includes
+the S3 path where the model artifacts are stored and the Docker registry path for the image that
+contains the inference code. The `ContainerDefinition` interface encapsulates both the specification
+of model inference code as a `ContainerImage` and an optional set of artifacts as `ModelData`.
+
+### Container Images
+
+Inference code can be stored in the Amazon EC2 Container Registry (Amazon ECR), which is specified
+via `ContainerDefinition`'s `image` property which accepts a class that extends the `ContainerImage`
+abstract base class.
+
+#### `EcrImage`
+
+Reference an image available within ECR:
+
+```typescript
+import * as ecr from '@aws-cdk/aws-ecr';
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const repository = ecr.Repository.fromRepositoryName(this, 'Repository', 'repo');
+const image = sagemaker.ContainerImage.fromEcrRepository(repository, 'tag');
+```
+
+#### `AssetImage`
+
+Reference a local directory containing a Dockerfile:
+
+```typescript
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+import * as path from 'path';
+
+const image = sagemaker.ContainerImage.fromAsset(this, 'Image', {
+  directory: path.join('path', 'to', 'Dockerfile', 'directory')
+});
+```
+
+### Model Artifacts
+
+Models are often associated with model artifacts, which are specified via the `modelData` property
+which accepts a class that extends the `ModelData` abstract base class. The default is to have no
+model artifacts associated with a model.
+
+#### `S3ModelData`
+
+Reference an S3 bucket and object key as the artifacts for a model:
+
+```typescript
+import * as s3 from '@aws-cdk/aws-s3';
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const bucket = new s3.Bucket(this, 'MyBucket');
+const modelData = sagemaker.ModelData.fromBucket(bucket, 'path/to/artifact/file.tar.gz');
+```
+
+#### `AssetModelData`
+
+Reference local model data:
+
+```typescript
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+import * as path from 'path';
+
+const modelData = sagemaker.ModelData.fromAsset(this, 'ModelData',
+  path.join('path', 'to', 'artifact', 'file.tar.gz'));
+```
+
+### `Model`
+
+The `Model` construct associates container images with their optional model data.
+
+#### Single Container Model
+
+In the event that a single container is sufficient for your inference use-case, you can define a
+single-container model:
+
+```typescript fixture=with-assets
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
+  container: {
+    image: image,
+    modelData: modelData,
+  }
+});
+```
+
+#### Inference Pipeline Model
+
+An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of two to
+five containers that process requests for inferences on data. You use an inference pipeline to
+define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own
+custom algorithms packaged in Docker containers. You can use an inference pipeline to combine
+preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully
+managed. To define an inference pipeline, you can provide additional containers for your model via
+the `extraContainers` property:
+
+```typescript fixture=with-assets
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const model = new sagemaker.Model(this, 'InferencePipelineModel', {
+  container: {
+    image: image1, modelData: modelData1
+  },
+  extraContainers: [
+    { image: image2, modelData: modelData2 },
+    { image: image3, modelData: modelData3 }
+  ],
+});
+```
+
+## Model Hosting
+
+Amazon SageMaker provides model hosting services for model deployment. Amazon SageMaker provides an
+HTTPS endpoint where your machine learning model is available to provide inferences.
+
+### Endpoint Configuration
+
+In this configuration, you identify one or more models to deploy and the resources that you want
+Amazon SageMaker to provision. You define one or more production variants, each of which identifies
+a model. Each production variant also describes the resources that you want Amazon SageMaker to
+provision. This includes the number and type of ML compute instances to deploy. If you are hosting
+multiple models, you also assign a variant weight to specify how much traffic you want to allocate
+to each model. For example, suppose that you want to host two models, A and B, and you assign
+traffic weight 2 for model A and 1 for model B. Amazon SageMaker distributes two-thirds of the
+traffic to Model A, and one-third to model B:
+
+```typescript fixture=with-assets
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
+  productionVariant: {
+    model: modelA,
+    variantName: 'modelA',
+    initialVariantWeight: 2.0,
+  },
+  extraProductionVariants: [{
+    model: modelB,
+    variantName: 'variantB',
+    initialVariantWeight: 1.0,
+  }]
+});
+```
+
+### Endpoint
+
+If you create an endpoint from an `EndpointConfig`, Amazon SageMaker launches the ML compute
+instances and deploys the model or models as specified in the configuration. To get inferences from
+the model, client applications send requests to the Amazon SageMaker Runtime HTTPS endpoint. For
+more information about the API, see the
+[InvokeEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_runtime_InvokeEndpoint.html)
+API. Defining an endpoint requires at minimum the associated endpoint configuration:
+
+```typescript fixture=with-endpoint-config
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
+```
+
+### AutoScaling
+
+The `autoScaleInstanceCount` method on the `IEndpointProductionVariant` interface can be used to
+enable Application Auto Scaling for the production variant:
+
+```typescript fixture=with-endpoint-config
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
+const productionVariant = endpoint.findProductionVariant('variantName');
+const instanceCount = productionVariant.autoScaleInstanceCount({
+  maxCapacity: 3
+});
+instanceCount.scaleOnInvocations('LimitRPS', {
+  maxRequestsPerSecond: 30,
+});
+```
+
+For load testing guidance on determining the maximum requests per second per instance, please see
+this [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-scaling-loadtest.html).
+
+### Metrics
+
+The `IEndpointProductionVariant` interface also provides a set of APIs for referencing CloudWatch
+metrics associated with a production variant associated with an endpoint:
+
+```typescript fixture=with-endpoint-config
+import * as sagemaker from '@aws-cdk/aws-sagemaker';
+
+const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
+const productionVariant = endpoint.findProductionVariant('variantName');
+productionVariant.metricModelLatency().createAlarm(this, 'ModelLatencyAlarm', {
+  threshold: 100000,
+  evaluationPeriods: 3,
+});
+```
+
+---
+
+Ticking the box below indicates that the public API of this RFC has been
+signed-off by the API bar raiser (the `api-approved` label was applied to the
+RFC pull request):
+
+```
+[ ] Signed-off by API Bar Raiser @xxxxx
+```
+
+## Public FAQ
+
+### What are we launching today?
+
+We are launching the first set of L2 constructs for an existing module (`@aws-cdk/aws-sagemaker`),
+introducing the `Endpoint` construct alongside its dependencies `EndpointConfig` and `Model`.
+Together, these constructs enable customers to deploy a machine learning model to an Amazon
+SageMaker-hosted endpoint which can be used for real-time inference via SageMaker's `InvokeEndpoint`
+API.
+
+### Why should I use this feature?
+
+SageMaker hosting for real-time inference provides a fully-managed, auto-scalable solution to
+customers wishing to deploy machine learning models behind an interactive endpoint.
+
+## Internal FAQ
+
+### Why are we doing this?
+
+The [tracking GitHub issue for the module](https://github.com/aws/aws-cdk/issues/6870) has 48 +1s,
+so there appears to be sufficient public demand for higher-level constructs above the existing L1s.
+
+As SageMaker models are composed of an algorithm (expressed as a Docker image) and data (expressed
+as an S3 object), the CDK's support for image and file assets would allow a customer to fully
+specify their endpoints' AWS infrastructure and resource dependencies solely using the CDK.
+
+Assets aside, a multi-variant, auto-scalable, CloudWatch-monitored endpoint within a VPC can be
+specified in just under 100 lines of code [using the proposed L2 CDK constructs][endpoint-cdk] which
+[generates a 1000+ line CloudFormation template][endpoint-cfn]. Producing an equivalent template
+using the existing SageMaker L1 constructs can prove challenging for customers as they have to
+stitch together the L1 SageMaker attributes (e.g., production variant names) to L2 constructs from
+other modules (e.g., CloudWatch, Application Auto Scaling) leaving room for manual error.
+
+[endpoint-cdk]: https://github.com/petermeansrock/aws-cdk/blob/43afc4259954c4b3708cf2b867cec6690e744423/packages/@aws-cdk/aws-sagemaker/test/integ.endpoint.ts
+[endpoint-cfn]: https://github.com/petermeansrock/aws-cdk/blob/43afc4259954c4b3708cf2b867cec6690e744423/packages/@aws-cdk/aws-sagemaker/test/endpoint.integ.snapshot/aws-cdk-sagemaker-endpoint.template.json
+
+### Why should we _not_ do this?
+
+In the time since the original PR for these constructs was authored in 2020, SageMaker has expanded
+its feature set to include [Amazon SageMaker Pipelines][sagemaker-pipelines], a CI/CD offering for
+training and deploying models. This offering directs customers to SageMaker Studio for interacting
+with their pipeline, which itself can be programmatically manipulated using the SageMaker Python
+SDK. Given the user experience difference between these new SageMaker products and other AWS
+infrastructure-as-code solutions (e.g., CloudFormation and the CDK), it's unclear how broader
+adoption of SageMaker CDK constructs aligns with the SageMaker product vision.
+
+[sagemaker-pipelines]: https://aws.amazon.com/sagemaker/pipelines/
+
+### What is the technical solution (design) of this feature?
+
+The proposed design has been fully implemented in
+[CDK PR #20113](https://github.com/aws/aws-cdk/pull/20113). Each of the following sections lays out
+the proposed interfaces needed for each L2 construct along with any supporting classes.
+
+#### Model
+
+- `IModel` -- interface for defined and imported models
+
+  ```ts
+  export interface IModel extends cdk.IResource, iam.IGrantable, ec2.IConnectable {
+    /**
+     * Returns the ARN of this model.
+     *
+     * @attribute
+     */
+    readonly modelArn: string;
+
+    /**
+     * Returns the name of this model.
+     *
+     * @attribute
+     */
+    readonly modelName: string;
+
+    /**
+     * The IAM role associated with this Model.
+     */
+    readonly role?: iam.IRole;
+
+    /**
+     * Adds a statement to the IAM role assumed by the instance.
+     */
+    addToRolePolicy(statement: iam.PolicyStatement): void;
+  }
+  ```
+
+- `ModelProps` -- configuration for defining a `Model`
+
+  ```ts
+  export interface ModelProps {
+    /**
+     * The IAM role that the Amazon SageMaker service assumes.
+     *
+     * @default a new IAM role will be created.
+     */
+    readonly role?: iam.IRole;
+
+    /**
+     * Name of the SageMaker Model.
+     *
+     * @default AWS CloudFormation generates a unique physical ID and uses that ID for the model's
+     * name.
+     */
+    readonly modelName?: string;
+
+    /**
+     * The VPC to deploy the endpoint to.
+     *
+     * @default none
+     */
+    readonly vpc?: ec2.IVpc;
+
+    /**
+     * The VPC subnets to deploy the endpoints.
+     *
+     * @default none
+     */
+    readonly vpcSubnets?: ec2.SubnetSelection;
+
+    /**
+     * The security groups to associate to the Model. If no security groups are provided and 'vpc' is
+     * configured, one security group will be created automatically.
+     *
+     * @default A security group will be automatically created if 'vpc' is supplied
+     */
+    readonly securityGroups?: ec2.ISecurityGroup[];
+
+    /**
+     * Specifies the primary container or the first container in an inference pipeline. Additional
+     * containers for an inference pipeline can be provided using the "extraContainers" property.
+     *
+     */
+    readonly container: ContainerDefinition;
+
+    /**
+     * Specifies additional containers for an inference pipeline.
+     *
+     * @default none
+     */
+    readonly extraContainers?: ContainerDefinition[];
+
+    /**
+     * Whether to allow the SageMaker Model to send all network traffic
+     *
+     * If set to false, you must individually add traffic rules to allow the
+     * SageMaker Model to connect to network targets.
+     *
+     * Only used if 'vpc' is supplied.
+     *
+     * @default true
+     */
+    readonly allowAllOutbound?: boolean;
+  }
+  ```
+
+- `ModelBase` -- abstract base definition class shared by defined and imported models
+
+  ```ts
+  abstract class ModelBase extends cdk.Resource implements IModel {
+    /**
+     * Returns the ARN of this model.
+     * @attribute
+     */
+    public abstract readonly modelArn: string;
+    /**
+     * Returns the name of the model.
+     * @attribute
+     */
+    public abstract readonly modelName: string;
+    /**
+     * Execution role for SageMaker Model
+     */
+    public abstract readonly role?: iam.IRole;
+    /**
+     * The principal this Model is running as
+     */
+    public abstract readonly grantPrincipal: iam.IPrincipal;
+    /**
+     * An accessor for the Connections object that will fail if this Model does not have a VPC
+     * configured.
+     */
+    public get connections(): ec2.Connections { ... }
+    /**
+     * The actual Connections object for this Model. This may be unset in the event that a VPC has not
+     * been configured.
+     * @internal
+     */
+    protected _connections: ec2.Connections | undefined;
+
+    /**
+     * Adds a statement to the IAM role assumed by the instance.
+     */
+    public addToRolePolicy(statement: iam.PolicyStatement) { ... }
+  }
+  ```
+
+- `Model` -- defines a SageMaker model (with helper methods for importing a model)
+
+  ```ts
+  export class Model extends ModelBase {
+    /**
+     * Imports a Model defined either outside the CDK or in a different CDK stack.
+     * @param scope the Construct scope.
+     * @param id the resource id.
+     * @param modelName the name of the model.
+     */
+    public static fromModelName(scope: Construct, id: string, modelName: string): IModel { ... }
+
+    /**
+     * Imports a Model defined either outside the CDK or in a different CDK stack.
+     * @param scope the Construct scope.
+     * @param id the resource id.
+     * @param attrs the attributes of the model to import.
+     */
+    public static fromModelAttributes(scope: Construct, id: string, attrs: ModelAttributes): IModel { ... }
+
+    /**
+     * Returns the ARN of this model.
+     * @attribute
+     */
+    public readonly modelArn: string;
+    /**
+     * Returns the name of the model.
+     * @attribute
+     */
+    public readonly modelName: string;
+    /**
+     * Execution role for SageMaker Model
+     */
+    public readonly role?: iam.IRole;
+    /**
+     * The principal this Model is running as
+     */
+    public readonly grantPrincipal: iam.IPrincipal;
+    private readonly subnets: ec2.SelectedSubnets | undefined;
+
+    constructor(scope: Construct, id: string, props: ModelProps) { ... }
+  }
+  ```
+
+##### Container Definition
+
+When defining a model above, the `ContainerDefinition` interface encapsulates both the specification
+of model inference code as a `ContainerImage` and an optional set of artifacts as `ModelData`. The
+image is specified as a Docker registry path while the model artifacts must be stored in S3.
+
+- `ContainerDefinition` -- describes the container, as part of model definition above
+
+  ```ts
+  export interface ContainerDefinition {
+    /**
+     * The image used to start a container.
+     */
+    readonly image: ContainerImage;
+
+    /**
+     * A map of environment variables to pass into the container.
+     *
+     * @default none
+     */
+    readonly environment?: {[key: string]: string};
+
+    /**
+     * Hostname of the container.
+     *
+     * @default none
+     */
+    readonly containerHostname?: string;
+
+    /**
+     * S3 path to the model artifacts.
+     *
+     * @default none
+     */
+    readonly modelData?: ModelData;
+  }
+  ```
+
+###### Container Image
+
+The following interface and abstract class provide mechanisms for configuring a container image.
+These closely mirror [analogous entities from the `@aws-cdk/ecs` module][ecs-image] but, rather than
+`bind`-ing upon an ECS task definition, instead operate upon a SageMaker model.
+
+[ecs-image]: https://github.com/aws/aws-cdk/blob/572b52c45a9eb08b62a0f9cc6520c1722089bae6/packages/@aws-cdk/aws-ecs/lib/container-image.ts
+
+- `ContainerImageConfig` -- the configuration for creating a container image
+
+  ```ts
+  export interface ContainerImageConfig {
+    /**
+     * The image name. Images in Amazon ECR repositories can be specified by either using the full registry/repository:tag or
+     * registry/repository@digest.
+     *
+     * For example, 012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>:latest or
+     * 012345678910.dkr.ecr.<region-name>.amazonaws.com/<repository-name>@sha256:94afd1f2e64d908bc90dbca0035a5b567EXAMPLE.
+     */
+    readonly imageName: string;
+  }
+  ```
+
+- `ContainerImage` -- abstract class defining `bind` contract for images alongside static factory
+  methods to enable different sources (e.g., image in ECR repository, local Dockerfile)
+
+  ```ts
+  export abstract class ContainerImage {
+    /**
+     * Reference an image in an ECR repository
+     */
+    public static fromEcrRepository(repository: ecr.IRepository, tag: string = 'latest'): ContainerImage { ... }
+
+    /**
+     * Reference an image that's constructed directly from sources on disk
+     *
+     * @param scope The scope within which to create the image asset
+     * @param id The id to assign to the image asset
+     * @param props The properties of a Docker image asset
+     */
+    public static fromAsset(scope: Construct, id: string, props: assets.DockerImageAssetProps): ContainerImage { ... }
+
+    /**
+     * Called when the image is used by a Model
+     */
+    public abstract bind(scope: Construct, model: Model): ContainerImageConfig;
+  }
+  ```
+
+###### Model Data
+
+Analogous to the above pairing of `ContainerImageConfig` and `ContainerImage`, the following
+interface and abstract class provide mechanisms for customers to specify the source of their model
+artifacts, either in an S3 bucket or a local file asset.
+
+- `ModelDataConfig` -- the configuration needed to reference model artifacts
+
+  ```ts
+  export interface ModelDataConfig {
+    /**
+     * The S3 path where the model artifacts, which result from model training, are stored. This path
+     * must point to a single gzip compressed tar archive (.tar.gz suffix).
+     */
+    readonly uri: string;
+  }
+  ```
+
+- `ModelData` -- model data represents the source of model artifacts, which will ultimately be
+  loaded from an S3 location
+
+  ```ts
+  export abstract class ModelData {
+    /**
+     * Constructs model data which is already available within S3.
+     * @param bucket The S3 bucket within which the model artifacts are stored
+     * @param objectKey The S3 object key at which the model artifacts are stored
+     */
+    public static fromBucket(bucket: s3.IBucket, objectKey: string): ModelData { ... }
+
+    /**
+     * Constructs model data that will be uploaded to S3 as part of the CDK app deployment.
+     * @param scope The scope within which to create a new asset
+     * @param id The id to associate with the new asset
+     * @param path The local path to a model artifact file as a gzipped tar file
+     */
+    public static fromAsset(scope: Construct, id: string, path: string): ModelData { ... }
+
+    /**
+     * This method is invoked by the SageMaker Model construct when it needs to resolve the model
+     * data to a URI.
+     * @param scope The scope within which the model data is resolved
+     * @param model The Model construct performing the URI resolution
+     */
+    public abstract bind(scope: Construct, model: IModel): ModelDataConfig;
+  }
+  ```
+
+#### Endpoint Configuration
+
+- `IEndpointConfig` -- the interface for a SageMaker EndpointConfig resource
+
+  ```ts
+  export interface IEndpointConfig extends cdk.IResource {
+    /**
+     * The ARN of the endpoint configuration.
+     *
+     * @attribute
+     */
+    readonly endpointConfigArn: string;
+    /**
+     * The name of the endpoint configuration.
+     *
+     * @attribute
+     */
+    readonly endpointConfigName: string;
+  }
+  ```
+
+- `EndpointConfigProps` -- construction properties for a SageMaker EndpointConfig
+
+  ```ts
+  export interface EndpointConfigProps {
+    /**
+     * Name of the endpoint configuration.
+     *
+     * @default AWS CloudFormation generates a unique physical ID and uses that ID for the endpoint
+     * configuration's name.
+     */
+    readonly endpointConfigName?: string;
+
+    /**
+     * Optional KMS encryption key associated with this stream.
+     *
+     * @default none
+     */
+    readonly encryptionKey?: kms.IKey;
+
+    /**
+     * A ProductionVariantProps object.
+     */
+    readonly productionVariant: ProductionVariantProps;
+
+    /**
+     * An optional list of extra ProductionVariantProps objects.
+     *
+     * @default none
+     */
+    readonly extraProductionVariants?: ProductionVariantProps[];
+  }
+  ```
+
+- `EndpointConfig` -- defines a SageMaker EndpointConfig  (with helper methods for importing an
+  endpoint config)
+
+  ```ts
+  export class EndpointConfig extends cdk.Resource implements IEndpointConfig {
+    /**
+     * Imports an EndpointConfig defined either outside the CDK or in a different CDK stack.
+     * @param scope the Construct scope.
+     * @param id the resource id.
+     * @param endpointConfigName the name of the endpoint configuration.
+     */
+    public static fromEndpointConfigName(scope: Construct, id: string, endpointConfigName: string): IEndpointConfig { ... }
+
+    /**
+     * The ARN of the endpoint configuration.
+     */
+    public readonly endpointConfigArn: string;
+    /**
+     * The name of the endpoint configuration.
+     */
+    public readonly endpointConfigName: string;
+
+    constructor(scope: Construct, id: string, props: EndpointConfigProps) { ... }
+
+    /**
+     * Add production variant to the endpoint configuration.
+     *
+     * @param props The properties of a production variant to add.
+     */
+    public addProductionVariant(props: ProductionVariantProps): void { ... }
+
+    /**
+     * Get production variants associated with endpoint configuration.
+     */
+    public get productionVariants(): ProductionVariant[] { ... }
+
+    /**
+     * Find production variant based on variant name
+     * @param name Variant name from production variant
+     */
+    public findProductionVariant(name: string): ProductionVariant { ... }
+  }
+  ```
+
+##### Production Variants
+
+To accommodate A/B testing of model behaviors, an endpoint config supports the specification of
+multiple production variants. Each variant's weight determines the traffic distribution to itself
+relative to the other configured variants.
+
+- `ProductionVariantProps` -- construction properties for a production variant
+
+  ```ts
+  export interface ProductionVariantProps {
+    /**
+     * The size of the Elastic Inference (EI) instance to use for the production variant. EI instances
+     * provide on-demand GPU computing for inference.
+     *
+     * @default none
+     */
+    readonly acceleratorType?: AcceleratorType;
+    /**
+     * Number of instances to launch initially.
+     *
+     * @default 1
+     */
+    readonly initialInstanceCount?: number;
+    /**
+     * Determines initial traffic distribution among all of the models that you specify in the
+     * endpoint configuration. The traffic to a production variant is determined by the ratio of the
+     * variant weight to the sum of all variant weight values across all production variants.
+     *
+     * @default 1.0
+     */
+    readonly initialVariantWeight?: number;
+    /**
+     * Instance type of the production variant.
+     *
+     * @default ml.t2.medium instance type.
+     */
+    readonly instanceType?: ec2.InstanceType;
+    /**
+     * The model to host.
+     */
+    readonly model: IModel;
+    /**
+     * Name of the production variant.
+     */
+    readonly variantName: string;
+  }
+  ```
+
+- `ProductionVariant` -- represents a production variant that has been associated with an
+  `EndpointConfig`
+
+  ```ts
+  export interface ProductionVariant {
+    /**
+     * The size of the Elastic Inference (EI) instance to use for the production variant. EI instances
+     * provide on-demand GPU computing for inference.
+     *
+     * @default none
+     */
+    readonly acceleratorType?: AcceleratorType;
+    /**
+     * Number of instances to launch initially.
+     */
+    readonly initialInstanceCount: number;
+    /**
+     * Determines initial traffic distribution among all of the models that you specify in the
+     * endpoint configuration. The traffic to a production variant is determined by the ratio of the
+     * variant weight to the sum of all variant weight values across all production variants.
+     */
+    readonly initialVariantWeight: number;
+    /**
+     * Instance type of the production variant.
+     */
+    readonly instanceType: ec2.InstanceType;
+    /**
+     * The name of the model to host.
+     */
+    readonly modelName: string;
+    /**
+     * The name of the production variant.
+     */
+    readonly variantName: string;
+  }
+  ```
+
+- `AcceleratorType` -- an enumeration of values representing the size of the Elastic Inference (EI)
+  instance to use for the production variant. EI instances provide on-demand GPU computing for
+  inference
+
+  ```ts
+  export enum AcceleratorType {
+    /**
+     * Medium accelerator type.
+     */
+    MEDIUM = 'ml.eia1.medium',
+    /**
+     * Large accelerator type.
+     */
+    LARGE = 'ml.eia1.large ',
+    /**
+     * Extra large accelerator type.
+     */
+    XLARGE = 'ml.eia1.xlarge',
+  }
+  ```
+
+#### Endpoint
+
+- `IEndpoint` -- the interface for a SageMaker Endpoint resource
+
+  ```ts
+  export interface IEndpoint extends cdk.IResource {
+    /**
+     * The ARN of the endpoint.
+     *
+     * @attribute
+     */
+    readonly endpointArn: string;
+    /**
+     * The name of the endpoint.
+     *
+     * @attribute
+     */
+    readonly endpointName: string;
+
+    /**
+     * Permits an IAM principal to invoke this endpoint
+     * @param grantee The principal to grant access to
+     */
+    grantInvoke(grantee: iam.IGrantable): iam.Grant;
+  }
+  ```
+
+- `EndpointProps` -- construction properties for a SageMaker endpoint
+
+  ```ts
+  export interface EndpointProps {
+
+    /**
+     * Name of the endpoint.
+     *
+     * @default AWS CloudFormation generates a unique physical ID and uses that ID for the endpoint's
+     * name.
+     */
+    readonly endpointName?: string;
+
+    /**
+     * The endpoint configuration to use for this endpoint.
+     *
+     * [disable-awslint:ref-via-interface]
+     */
+    readonly endpointConfig: EndpointConfig;
+  }
+  ```
+
+- `EndpointBase` -- abstract base definition class shared by defined and imported endpoints
+
+  ```ts
+  abstract class EndpointBase extends cdk.Resource implements IEndpoint {
+    /**
+     * The ARN of the endpoint.
+     *
+     * @attribute
+     */
+    public abstract readonly endpointArn: string;
+
+    /**
+     * The name of the endpoint.
+     *
+     * @attribute
+     */
+    public abstract readonly endpointName: string;
+
+    /**
+     * Permits an IAM principal to invoke this endpoint
+     * @param grantee The principal to grant access to
+     */
+    public grantInvoke(grantee: iam.IGrantable) { ... }
+  }
+  ```
+
+- `Endpoint` -- defines a SageMaker endpoint (with helper methods for importing an endpoint)
+
+  ```ts
+  export class Endpoint extends EndpointBase {
+    /**
+     * Imports an Endpoint defined either outside the CDK or in a different CDK stack.
+     * @param scope the Construct scope.
+     * @param id the resource id.
+     * @param endpointName the name of the endpoint.
+     */
+    public static fromEndpointName(scope: Construct, id: string, endpointName: string): IEndpoint { ... }
+
+    /**
+     * The ARN of the endpoint.
+     *
+     * @attribute
+     */
+    public readonly endpointArn: string;
+    /**
+     * The name of the endpoint.
+     *
+     * @attribute
+     */
+    public readonly endpointName: string;
+
+    constructor(scope: Construct, id: string, props: EndpointProps) { ... }
+
+    /**
+     * Get production variants associated with endpoint.
+     */
+    public get productionVariants(): IEndpointProductionVariant[] { ... }
+
+    /**
+     * Find production variant based on variant name
+     * @param name Variant name from production variant
+     */
+    public findProductionVariant(name: string): IEndpointProductionVariant { ... }
+  }
+  ```
+
+##### Endpoint Production Variants
+
+When monitoring or auto-scaling real-time inference endpoints, both CloudWatch and Application Auto
+Scaling operate at the level of endpoint name + variant name. For this reason, once a variant has
+been attached to an endpoint, this RFC allows customers to retrieve `IEndpointProductionVariant`
+instances from their endpoint for the purposes of referencing CloudWatch metrics or an Application
+Auto Scaling `BaseScalableAttribute`.
+
+- `IEndpointProductionVariant` -- represents a production variant that has been associated with an
+  endpoint
+
+  ```ts
+  export interface IEndpointProductionVariant {
+    /**
+     * The name of the production variant.
+     */
+    readonly variantName: string;
+    /**
+     * Return the given named metric for Endpoint
+     *
+     * @default sum over 5 minutes
+     */
+    metric(namespace: string, metricName: string, props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for the number of invocations
+     *
+     * @default sum over 5 minutes
+     */
+    metricInvocations(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for the number of invocations per instance
+     *
+     * @default sum over 5 minutes
+     */
+    metricInvocationsPerInstance(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for model latency
+     *
+     * @default average over 5 minutes
+     */
+    metricModelLatency(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for overhead latency
+     *
+     * @default average over 5 minutes
+     */
+    metricOverheadLatency(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for the number of invocations by HTTP response code
+     *
+     * @default sum over 5 minutes
+     */
+    metricInvocationResponseCode(responseCode: InvocationHttpResponseCode, props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for disk utilization
+     *
+     * @default average over 5 minutes
+     */
+    metricDiskUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for CPU utilization
+     *
+     * @default average over 5 minutes
+     */
+    metricCPUUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for memory utilization
+     *
+     * @default average over 5 minutes
+     */
+    metricMemoryUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for GPU utilization
+     *
+     * @default average over 5 minutes
+     */
+    metricGPUUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Metric for GPU memory utilization
+     *
+     * @default average over 5 minutes
+     */
+    metricGPUMemoryUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    /**
+     * Enable autoscaling for SageMaker Endpoint production variant
+     *
+     * @param scalingProps EnableScalingProps
+     */
+    autoScaleInstanceCount(scalingProps: appscaling.EnableScalingProps): ScalableInstanceCount;
+  }
+
+  class EndpointProductionVariant implements IEndpointProductionVariant { ... }
+  ```
+
+### Is this a breaking change?
+
+No.
+
+### What alternative solutions did you consider?
+
+1. Theoretically, the `ContainerImage` code (referenced [above](#container-image)) from the
+   `@aws-cdk/ecs` and `@aws-cdk/sagemaker` modules could be unified assuming it would be sufficient
+   for both use-cases to `bind` using an `IGrantable` (adjusting ECS's `TaskDefinition`
+   accordingly). However, it's unclear within which module such a unified API should reside as
+   support for private repositories makes it a bad fit for `@aws-cdk/ecr` and `@aws-cdk/ecr-assets`,
+   and it would be unintuitive for `@aws-cdk/sagemaker` to declare a dependency on `@aws-cdk/ecs`.
+
+   Package concerns aside, historically, there was a period during which SageMaker only supported
+   ECR as an image source while ECS was capable of sourcing images from either ECR or a
+   customer-owned private repository. Given the fact that these two products' supported images
+   sources may yet again diverge in the future, maybe it would be best to keep their
+   `ContainerImage` APIs separate within their respective modules.
+1. In the [earliest PR][earliest-pr] attempting to add SageMaker L2 constructs to the CDK, the
+   author did not create an `EndpointConfig` construct, instead hiding the resource's creation
+   behind `Endpoint` (to which production variants could be added). Although a simplifier, this
+   prevents customers from reusing configuration across endpoints. For this reason, an explicit
+   L2 construct for endpoint configuration was incorporated into this RFC.
+
+[earliest-pr]: https://github.com/aws/aws-cdk/pull/2888
+
+### What are the drawbacks of this solution?
+
+This RFC and its [original associated implementation PR][original-pr] were based on a Q3 2019
+feature set of SageMaker real-time inference endpoints. Since that point in time, SageMaker has
+launched the following features which would require further additions to the L2 API contracts:
+
+* [Multi-model endpoints][multi-model]
+* [Model Monitor][model-monitor]
+* [Asynchronous inference][async-inference]
+* [Deployment guardrails][deployment-guardrails]
+* [Serverless inference][serverless-inference]
+
+Although some of these changes would be small and additive (e.g., `DataCaptureConfig` for Model
+Monitor), features like asynchronous and serverless inference represent more significant shifts in
+functionality. For example, SageMaker hosts real-time inference endpoints on EC2 instances, meaning
+that CloudWatch alarms and Application Auto Scaling rules operate on instance-based metrics. In
+contrast, serverless inference does not expose any instance-based metrics nor does it yet support
+auto-scaling. Since both features are specified via the CloudFormation resource
+`AWS::SageMaker::EndpointConfig`, the current recommendation of this RFC would be to support the
+specification of both use-cases through a single L2 `EndointConfig` construct. However, this
+presents a challenge when modeling helper APIs like `metricCPUUtilization` or
+`autoScaleInstanceCount` on a related construct as those methods would not universally apply to
+*all* endpoint types.
+
+Given that (1) RFC reviewers may have idiomatic recommendations to solve such modeling challenges
+and (2) the current proposed constructs are still viable for creating a SageMaker endpoint, the
+first draft of this RFC is being published without further revisions accommodating the above newer
+SageMaker features.
+
+[original-pr]: https://github.com/aws/aws-cdk/pull/6107
+[multi-model]: https://aws.amazon.com/blogs/machine-learning/save-on-inference-costs-by-using-amazon-sagemaker-multi-model-endpoints/
+[model-monitor]: https://aws.amazon.com/about-aws/whats-new/2019/12/introducing-amazon-sagemaker-model-monitor/
+[async-inference]: https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-sagemaker-asynchronous-new-inference-option/
+[deployment-guardrails]: https://aws.amazon.com/about-aws/whats-new/2021/11/new-deployment-guardrails-amazon-sagemaker-inference-endpoints/
+[serverless-inference]: https://aws.amazon.com/about-aws/whats-new/2021/12/amazon-sagemaker-serverless-inference/
+
+### What is the high-level project plan?
+
+As the proposed design has been fully implemented in
+[CDK PR #20113](https://github.com/aws/aws-cdk/pull/20113), the delivery timeline of the
+implementation of this RFC will be contingent upon the scope of changes requested by reviewers. For
+baking, the L2 constructs for this module would be marked as experimental, leaving room for further
+adjustments prior to marking the APIs as stable.
+
+### Are there any open issues that need to be addressed later?
+
+1. Please see the [drawbacks section above](#what-are-the-drawbacks-of-this-solution) for potential
+   follow-on work (assuming it need not be incorporated into this RFC).
+1. As observed with [Lambda][lambda-eni-issue] and [EKS][eks-eni-issue], the Elastic Network
+   Interfaces (ENIs) associated with a SageMaker model's VPC are not always cleaned up in a timely
+   manner after downstream compute resources are deleted. As a result, attempts to delete a
+   SageMaker endpoint along with its networking resources (e.g., subnets, security groups) from a
+   CloudFormation stack may cause the stack operation to fail as the ENIs are still in use. From a
+   CDK integration test perspective, specifying `--no-clean` will allow the generation of a snapshot
+   regardless of whether stack deletion will succeed or fail but may hinder snapshot re-generation
+   by subsequent CDK contributors. For this reason, it may be helpful to exclude VPC specification
+   from the endpoint integration test at this time.
+
+[lambda-eni-issue]: https://github.com/aws/aws-cdk/issues/12827
+[eks-eni-issue]: https://github.com/aws/aws-cdk/issues/9970
+
+## Appendix
+
+Feel free to add any number of appendices as you see fit. Appendices are
+expected to allow readers to dive deeper to certain sections if they like. For
+example, you can include an appendix which describes the detailed design of an
+algorithm and reference it from the FAQ.

From 75dbf2360c7e6bb0e046e89954703a44723da203 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Wed, 4 May 2022 11:52:13 -0700
Subject: [PATCH 02/30] Update author list to mirror issue/PR

---
 text/0431-sagemaker-l2-endpoint.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index d2a874ca0..1bb1451f0 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -1,6 +1,6 @@
 SageMaker Model Hosting L2 Constructs
 
-* **Original Author(s):**: @pvanlund
+* **Original Author(s):** @pvanlund, @mattmcclean, @l2yao, @jetterdj, @foxpro24, @rangoju
 * **Tracking Issue**: #431
 * **API Bar Raiser**: *TBD*
 

From 3a39cd4c23a9e1a37bcabf90ea9950704db1aeed Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 21 Jul 2022 16:47:18 -0700
Subject: [PATCH 03/30] Reword & restructure README based on feedback

---
 text/0431-sagemaker-l2-endpoint.md | 138 ++++++++++++++++-------------
 1 file changed, 74 insertions(+), 64 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 1bb1451f0..82de5dd86 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -62,114 +62,124 @@ import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
 ## Model
 
-By creating a model, you tell Amazon SageMaker where it can find the model components. This includes
-the S3 path where the model artifacts are stored and the Docker registry path for the image that
-contains the inference code. The `ContainerDefinition` interface encapsulates both the specification
-of model inference code as a `ContainerImage` and an optional set of artifacts as `ModelData`.
+In machine learning, a model is used to make predictions, or inferences. A deployable model in
+SageMaker consists of inference code and model artifacts. Model artifacts are the results of model
+training by using a machine learning algorithm. The inference code must be packaged in a Docker
+container and stored in Amazon ECR. You can either package the model artifacts in the same container
+as the inference code, or store them in Amazon S3. As model artifacts may change each time a new
+model is trained (while the inference code may remain static), artifact separation in S3 may act as
+a natural decoupling point for your application.
 
-### Container Images
+When instantiating the `Model` construct, you tell Amazon SageMaker where it can find these model
+components. The `ContainerDefinition` interface encapsulates both the specification of model
+inference code as a `ContainerImage` and an optional set of separate artifacts as `ModelData`.
 
-Inference code can be stored in the Amazon EC2 Container Registry (Amazon ECR), which is specified
-via `ContainerDefinition`'s `image` property which accepts a class that extends the `ContainerImage`
-abstract base class.
+### Single Container Model
 
-#### `EcrImage`
-
-Reference an image available within ECR:
+In the event that a single container is sufficient for your inference use-case, you can define a
+single-container model:
 
 ```typescript
-import * as ecr from '@aws-cdk/aws-ecr';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
+import * as path from 'path';
 
-const repository = ecr.Repository.fromRepositoryName(this, 'Repository', 'repo');
-const image = sagemaker.ContainerImage.fromEcrRepository(repository, 'tag');
+const image = sagemaker.ContainerImage.fromAsset(this, 'Image', {
+  directory: path.join('path', 'to', 'Dockerfile', 'directory')
+});
+const modelData = sagemaker.ModelData.fromAsset(this, 'ModelData',
+  path.join('path', 'to', 'artifact', 'file.tar.gz'));
+
+const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
+  container: {
+    image: image,
+    modelData: modelData,
+  }
+});
 ```
 
-#### `AssetImage`
+### Inference Pipeline Model
 
-Reference a local directory containing a Dockerfile:
+An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of two to
+five containers that process requests for inferences on data. You use an inference pipeline to
+define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own
+custom algorithms packaged in Docker containers. You can use an inference pipeline to combine
+preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully
+managed. To define an inference pipeline, you can provide additional containers for your model via
+the `extraContainers` property:
 
-```typescript
+```typescript fixture=with-assets
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
-import * as path from 'path';
 
-const image = sagemaker.ContainerImage.fromAsset(this, 'Image', {
-  directory: path.join('path', 'to', 'Dockerfile', 'directory')
+const model = new sagemaker.Model(this, 'InferencePipelineModel', {
+  container: {
+    image: image1, modelData: modelData1
+  },
+  extraContainers: [
+    { image: image2, modelData: modelData2 },
+    { image: image3, modelData: modelData3 }
+  ],
 });
 ```
 
-### Model Artifacts
+### Container Images
 
-Models are often associated with model artifacts, which are specified via the `modelData` property
-which accepts a class that extends the `ModelData` abstract base class. The default is to have no
-model artifacts associated with a model.
+Inference code can be stored in the Amazon EC2 Container Registry (Amazon ECR), which is specified
+via `ContainerDefinition`'s `image` property which accepts a class that extends the `ContainerImage`
+abstract base class.
 
-#### `S3ModelData`
+#### Asset Image
 
-Reference an S3 bucket and object key as the artifacts for a model:
+Reference a local directory containing a Dockerfile:
 
 ```typescript
-import * as s3 from '@aws-cdk/aws-s3';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
+import * as path from 'path';
 
-const bucket = new s3.Bucket(this, 'MyBucket');
-const modelData = sagemaker.ModelData.fromBucket(bucket, 'path/to/artifact/file.tar.gz');
+const image = sagemaker.ContainerImage.fromAsset(this, 'Image', {
+  directory: path.join('path', 'to', 'Dockerfile', 'directory')
+});
 ```
 
-#### `AssetModelData`
+#### ECR Image
 
-Reference local model data:
+Reference an image available within ECR:
 
 ```typescript
+import * as ecr from '@aws-cdk/aws-ecr';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
-import * as path from 'path';
 
-const modelData = sagemaker.ModelData.fromAsset(this, 'ModelData',
-  path.join('path', 'to', 'artifact', 'file.tar.gz'));
+const repository = ecr.Repository.fromRepositoryName(this, 'Repository', 'repo');
+const image = sagemaker.ContainerImage.fromEcrRepository(repository, 'tag');
 ```
 
-### `Model`
+### Model Artifacts
 
-The `Model` construct associates container images with their optional model data.
+If you choose to decouple your model artifacts from your inference code, the artifacts can be
+specified via the `modelData` property which accepts a class that extends the `ModelData` abstract
+base class. The default is to have no model artifacts associated with a model.
 
-#### Single Container Model
+#### Asset Model Data
 
-In the event that a single container is sufficient for your inference use-case, you can define a
-single-container model:
+Reference local model data:
 
-```typescript fixture=with-assets
+```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
+import * as path from 'path';
 
-const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
-  container: {
-    image: image,
-    modelData: modelData,
-  }
-});
+const modelData = sagemaker.ModelData.fromAsset(this, 'ModelData',
+  path.join('path', 'to', 'artifact', 'file.tar.gz'));
 ```
 
-#### Inference Pipeline Model
+#### S3 Model Data
 
-An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of two to
-five containers that process requests for inferences on data. You use an inference pipeline to
-define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own
-custom algorithms packaged in Docker containers. You can use an inference pipeline to combine
-preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully
-managed. To define an inference pipeline, you can provide additional containers for your model via
-the `extraContainers` property:
+Reference an S3 bucket and object key as the artifacts for a model:
 
-```typescript fixture=with-assets
+```typescript
+import * as s3 from '@aws-cdk/aws-s3';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
-const model = new sagemaker.Model(this, 'InferencePipelineModel', {
-  container: {
-    image: image1, modelData: modelData1
-  },
-  extraContainers: [
-    { image: image2, modelData: modelData2 },
-    { image: image3, modelData: modelData3 }
-  ],
-});
+const bucket = new s3.Bucket(this, 'MyBucket');
+const modelData = sagemaker.ModelData.fromBucket(bucket, 'path/to/artifact/file.tar.gz');
 ```
 
 ## Model Hosting

From ed5998b57bd740016712c96bc3ff2f0af2637870 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 21 Jul 2022 16:50:15 -0700
Subject: [PATCH 04/30] Remove one CDKv1-specific module reference

---
 text/0431-sagemaker-l2-endpoint.md | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 82de5dd86..4db5bae83 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -281,11 +281,10 @@ RFC pull request):
 
 ### What are we launching today?
 
-We are launching the first set of L2 constructs for an existing module (`@aws-cdk/aws-sagemaker`),
-introducing the `Endpoint` construct alongside its dependencies `EndpointConfig` and `Model`.
-Together, these constructs enable customers to deploy a machine learning model to an Amazon
-SageMaker-hosted endpoint which can be used for real-time inference via SageMaker's `InvokeEndpoint`
-API.
+We are launching the first set of L2 constructs for the SageMaker module, introducing the `Endpoint`
+construct alongside its dependencies `EndpointConfig` and `Model`. Together, these constructs enable
+customers to deploy a machine learning model to an Amazon SageMaker-hosted endpoint which can be
+used for real-time inference via SageMaker's `InvokeEndpoint` API.
 
 ### Why should I use this feature?
 

From 73b6aae202e456a72887199ad795ce84a448af62 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 16 Aug 2022 14:07:07 -0700
Subject: [PATCH 05/30] Fix specification of defaults

---
 text/0431-sagemaker-l2-endpoint.md | 58 +++++++++++++++---------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 4db5bae83..1d404ae47 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -369,14 +369,14 @@ the proposed interfaces needed for each L2 construct along with any supporting c
     /**
      * The IAM role that the Amazon SageMaker service assumes.
      *
-     * @default a new IAM role will be created.
+     * @default - a new IAM role will be created.
      */
     readonly role?: iam.IRole;
 
     /**
      * Name of the SageMaker Model.
      *
-     * @default AWS CloudFormation generates a unique physical ID and uses that ID for the model's
+     * @default - AWS CloudFormation generates a unique physical ID and uses that ID for the model's
      * name.
      */
     readonly modelName?: string;
@@ -384,14 +384,14 @@ the proposed interfaces needed for each L2 construct along with any supporting c
     /**
      * The VPC to deploy the endpoint to.
      *
-     * @default none
+     * @default - none
      */
     readonly vpc?: ec2.IVpc;
 
     /**
      * The VPC subnets to deploy the endpoints.
      *
-     * @default none
+     * @default - none
      */
     readonly vpcSubnets?: ec2.SubnetSelection;
 
@@ -399,7 +399,7 @@ the proposed interfaces needed for each L2 construct along with any supporting c
      * The security groups to associate to the Model. If no security groups are provided and 'vpc' is
      * configured, one security group will be created automatically.
      *
-     * @default A security group will be automatically created if 'vpc' is supplied
+     * @default - A security group will be automatically created if 'vpc' is supplied
      */
     readonly securityGroups?: ec2.ISecurityGroup[];
 
@@ -413,7 +413,7 @@ the proposed interfaces needed for each L2 construct along with any supporting c
     /**
      * Specifies additional containers for an inference pipeline.
      *
-     * @default none
+     * @default - none
      */
     readonly extraContainers?: ContainerDefinition[];
 
@@ -534,21 +534,21 @@ image is specified as a Docker registry path while the model artifacts must be s
     /**
      * A map of environment variables to pass into the container.
      *
-     * @default none
+     * @default - none
      */
     readonly environment?: {[key: string]: string};
 
     /**
      * Hostname of the container.
      *
-     * @default none
+     * @default - none
      */
     readonly containerHostname?: string;
 
     /**
      * S3 path to the model artifacts.
      *
-     * @default none
+     * @default - none
      */
     readonly modelData?: ModelData;
   }
@@ -679,15 +679,15 @@ artifacts, either in an S3 bucket or a local file asset.
     /**
      * Name of the endpoint configuration.
      *
-     * @default AWS CloudFormation generates a unique physical ID and uses that ID for the endpoint
-     * configuration's name.
+     * @default - AWS CloudFormation generates a unique physical ID and uses that ID for the
+     * endpoint configuration's name.
      */
     readonly endpointConfigName?: string;
 
     /**
      * Optional KMS encryption key associated with this stream.
      *
-     * @default none
+     * @default - none
      */
     readonly encryptionKey?: kms.IKey;
 
@@ -699,7 +699,7 @@ artifacts, either in an S3 bucket or a local file asset.
     /**
      * An optional list of extra ProductionVariantProps objects.
      *
-     * @default none
+     * @default - none
      */
     readonly extraProductionVariants?: ProductionVariantProps[];
   }
@@ -763,7 +763,7 @@ relative to the other configured variants.
      * The size of the Elastic Inference (EI) instance to use for the production variant. EI instances
      * provide on-demand GPU computing for inference.
      *
-     * @default none
+     * @default - none
      */
     readonly acceleratorType?: AcceleratorType;
     /**
@@ -783,7 +783,7 @@ relative to the other configured variants.
     /**
      * Instance type of the production variant.
      *
-     * @default ml.t2.medium instance type.
+     * @default - ml.t2.medium instance type.
      */
     readonly instanceType?: ec2.InstanceType;
     /**
@@ -806,7 +806,7 @@ relative to the other configured variants.
      * The size of the Elastic Inference (EI) instance to use for the production variant. EI instances
      * provide on-demand GPU computing for inference.
      *
-     * @default none
+     * @default - none
      */
     readonly acceleratorType?: AcceleratorType;
     /**
@@ -890,8 +890,8 @@ relative to the other configured variants.
     /**
      * Name of the endpoint.
      *
-     * @default AWS CloudFormation generates a unique physical ID and uses that ID for the endpoint's
-     * name.
+     * @default - AWS CloudFormation generates a unique physical ID and uses that ID for the
+     * endpoint's name.
      */
     readonly endpointName?: string;
 
@@ -990,67 +990,67 @@ Auto Scaling `BaseScalableAttribute`.
     /**
      * Return the given named metric for Endpoint
      *
-     * @default sum over 5 minutes
+     * @default - sum over 5 minutes
      */
     metric(namespace: string, metricName: string, props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for the number of invocations
      *
-     * @default sum over 5 minutes
+     * @default - sum over 5 minutes
      */
     metricInvocations(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for the number of invocations per instance
      *
-     * @default sum over 5 minutes
+     * @default - sum over 5 minutes
      */
     metricInvocationsPerInstance(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for model latency
      *
-     * @default average over 5 minutes
+     * @default - average over 5 minutes
      */
     metricModelLatency(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for overhead latency
      *
-     * @default average over 5 minutes
+     * @default - average over 5 minutes
      */
     metricOverheadLatency(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for the number of invocations by HTTP response code
      *
-     * @default sum over 5 minutes
+     * @default - sum over 5 minutes
      */
     metricInvocationResponseCode(responseCode: InvocationHttpResponseCode, props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for disk utilization
      *
-     * @default average over 5 minutes
+     * @default - average over 5 minutes
      */
     metricDiskUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for CPU utilization
      *
-     * @default average over 5 minutes
+     * @default - average over 5 minutes
      */
     metricCPUUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for memory utilization
      *
-     * @default average over 5 minutes
+     * @default - average over 5 minutes
      */
     metricMemoryUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for GPU utilization
      *
-     * @default average over 5 minutes
+     * @default - average over 5 minutes
      */
     metricGPUUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for GPU memory utilization
      *
-     * @default average over 5 minutes
+     * @default - average over 5 minutes
      */
     metricGPUMemoryUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**

From f29a71601da56458df8b10935b77a1f402f4229a Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 16 Aug 2022 14:10:38 -0700
Subject: [PATCH 06/30] Simplify autoscaling documentation

---
 text/0431-sagemaker-l2-endpoint.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 1d404ae47..e7bf7036f 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -232,8 +232,7 @@ const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
 
 ### AutoScaling
 
-The `autoScaleInstanceCount` method on the `IEndpointProductionVariant` interface can be used to
-enable Application Auto Scaling for the production variant:
+To enable autoscaling on the production variant, use the `autoScaleInstanceCount` method:
 
 ```typescript fixture=with-endpoint-config
 import * as sagemaker from '@aws-cdk/aws-sagemaker';

From a853734b1e8a958fdb5c7eecde74d0e5da7bb791 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 16 Aug 2022 14:24:37 -0700
Subject: [PATCH 07/30] Remove mention of endpoint from model VPC docs

---
 text/0431-sagemaker-l2-endpoint.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index e7bf7036f..dc6c696a8 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -381,14 +381,14 @@ the proposed interfaces needed for each L2 construct along with any supporting c
     readonly modelName?: string;
 
     /**
-     * The VPC to deploy the endpoint to.
+     * The VPC to deploy model containers to.
      *
      * @default - none
      */
     readonly vpc?: ec2.IVpc;
 
     /**
-     * The VPC subnets to deploy the endpoints.
+     * The VPC subnets to use when deploying model containers.
      *
      * @default - none
      */

From b4401e0f3e4a6c76a5c6d64c6596c3c519d1171d Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 16 Aug 2022 14:33:08 -0700
Subject: [PATCH 08/30] Eliminate uppercase abbreviations

---
 text/0431-sagemaker-l2-endpoint.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index dc6c696a8..d710a7322 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -1033,7 +1033,7 @@ Auto Scaling `BaseScalableAttribute`.
      *
      * @default - average over 5 minutes
      */
-    metricCPUUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    metricCpuUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for memory utilization
      *
@@ -1045,13 +1045,13 @@ Auto Scaling `BaseScalableAttribute`.
      *
      * @default - average over 5 minutes
      */
-    metricGPUUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    metricGpuUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Metric for GPU memory utilization
      *
      * @default - average over 5 minutes
      */
-    metricGPUMemoryUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+    metricGpuMemoryUtilization(props?: cloudwatch.MetricOptions): cloudwatch.Metric;
     /**
      * Enable autoscaling for SageMaker Endpoint production variant
      *

From 5fa4ddcc2d3009f3129c157e26765533bfcce129 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 16 Aug 2022 14:37:23 -0700
Subject: [PATCH 09/30] Remove second CDKv1-specific module reference

---
 text/0431-sagemaker-l2-endpoint.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index d710a7322..f4b6e64b2 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -556,8 +556,8 @@ image is specified as a Docker registry path while the model artifacts must be s
 ###### Container Image
 
 The following interface and abstract class provide mechanisms for configuring a container image.
-These closely mirror [analogous entities from the `@aws-cdk/ecs` module][ecs-image] but, rather than
-`bind`-ing upon an ECS task definition, instead operate upon a SageMaker model.
+These closely mirror [analogous entities from the ECS module][ecs-image] but, rather than `bind`-ing
+upon an ECS task definition, instead operate upon a SageMaker model.
 
 [ecs-image]: https://github.com/aws/aws-cdk/blob/572b52c45a9eb08b62a0f9cc6520c1722089bae6/packages/@aws-cdk/aws-ecs/lib/container-image.ts
 

From ce6445f5ebdab543d992160434e02a219f05f952 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 16 Aug 2022 17:29:07 -0700
Subject: [PATCH 10/30] Simplify container/variant props

---
 text/0431-sagemaker-l2-endpoint.md | 67 +++++++++++++-----------------
 1 file changed, 29 insertions(+), 38 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index f4b6e64b2..275ef67d5 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -90,10 +90,12 @@ const modelData = sagemaker.ModelData.fromAsset(this, 'ModelData',
   path.join('path', 'to', 'artifact', 'file.tar.gz'));
 
 const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
-  container: {
-    image: image,
-    modelData: modelData,
-  }
+  containers: [
+    {
+      image: image,
+      modelData: modelData,
+    }
+  ]
 });
 ```
 
@@ -104,17 +106,14 @@ five containers that process requests for inferences on data. You use an inferen
 define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own
 custom algorithms packaged in Docker containers. You can use an inference pipeline to combine
 preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully
-managed. To define an inference pipeline, you can provide additional containers for your model via
-the `extraContainers` property:
+managed. To define an inference pipeline, you can provide additional containers for your model:
 
 ```typescript fixture=with-assets
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
 const model = new sagemaker.Model(this, 'InferencePipelineModel', {
-  container: {
-    image: image1, modelData: modelData1
-  },
-  extraContainers: [
+  containers: [
+    { image: image1, modelData: modelData1 },
     { image: image2, modelData: modelData2 },
     { image: image3, modelData: modelData3 }
   ],
@@ -202,16 +201,18 @@ traffic to Model A, and one-third to model B:
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
 const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
-  productionVariant: {
-    model: modelA,
-    variantName: 'modelA',
-    initialVariantWeight: 2.0,
-  },
-  extraProductionVariants: [{
-    model: modelB,
-    variantName: 'variantB',
-    initialVariantWeight: 1.0,
-  }]
+  productionVariants: [
+    {
+      model: modelA,
+      variantName: 'modelA',
+      initialVariantWeight: 2.0,
+    },
+    {
+      model: modelB,
+      variantName: 'variantB',
+      initialVariantWeight: 1.0,
+    },
+  ]
 });
 ```
 
@@ -403,18 +404,12 @@ the proposed interfaces needed for each L2 construct along with any supporting c
     readonly securityGroups?: ec2.ISecurityGroup[];
 
     /**
-     * Specifies the primary container or the first container in an inference pipeline. Additional
-     * containers for an inference pipeline can be provided using the "extraContainers" property.
-     *
-     */
-    readonly container: ContainerDefinition;
-
-    /**
-     * Specifies additional containers for an inference pipeline.
+     * Specifies the container definitions for this model, consisting of either a single primary
+     * container or an inference pipeline of multiple containers.
      *
      * @default - none
      */
-    readonly extraContainers?: ContainerDefinition[];
+    readonly containers?: ContainerDefinition[];
 
     /**
      * Whether to allow the SageMaker Model to send all network traffic
@@ -511,7 +506,7 @@ the proposed interfaces needed for each L2 construct along with any supporting c
     public readonly grantPrincipal: iam.IPrincipal;
     private readonly subnets: ec2.SelectedSubnets | undefined;
 
-    constructor(scope: Construct, id: string, props: ModelProps) { ... }
+    constructor(scope: Construct, id: string, props: ModelProps = {}) { ... }
   }
   ```
 
@@ -691,16 +686,12 @@ artifacts, either in an S3 bucket or a local file asset.
     readonly encryptionKey?: kms.IKey;
 
     /**
-     * A ProductionVariantProps object.
-     */
-    readonly productionVariant: ProductionVariantProps;
-
-    /**
-     * An optional list of extra ProductionVariantProps objects.
+     * A list of production variants. You can always add more variants later by calling
+     * {@link EndpointConfig#addProductionVariant}.
      *
      * @default - none
      */
-    readonly extraProductionVariants?: ProductionVariantProps[];
+    readonly productionVariants?: ProductionVariantProps[];
   }
   ```
 
@@ -726,7 +717,7 @@ artifacts, either in an S3 bucket or a local file asset.
      */
     public readonly endpointConfigName: string;
 
-    constructor(scope: Construct, id: string, props: EndpointConfigProps) { ... }
+    constructor(scope: Construct, id: string, props: EndpointConfigProps = {}) { ... }
 
     /**
      * Add production variant to the endpoint configuration.

From 73617616ef3d35cd863ac79103159477d10af66c Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Wed, 17 Aug 2022 16:32:05 -0700
Subject: [PATCH 11/30] Remove container limit in README as it may change

---
 text/0431-sagemaker-l2-endpoint.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 275ef67d5..9a4692e19 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -101,8 +101,8 @@ const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
 
 ### Inference Pipeline Model
 
-An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of two to
-five containers that process requests for inferences on data. You use an inference pipeline to
+An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of multiple
+containers that process requests for inferences on data. You use an inference pipeline to
 define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own
 custom algorithms packaged in Docker containers. You can use an inference pipeline to combine
 preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully

From 40af32b6207fe445c45476ec4e56bf6f3363efd1 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 25 Aug 2022 16:40:02 -0700
Subject: [PATCH 12/30] Remove non-default Rosetta fixtures

---
 text/0431-sagemaker-l2-endpoint.md | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 9a4692e19..4ac04d10e 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -108,9 +108,16 @@ custom algorithms packaged in Docker containers. You can use an inference pipeli
 preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully
 managed. To define an inference pipeline, you can provide additional containers for your model:
 
-```typescript fixture=with-assets
+```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
+declare const image1: sagemaker.ContainerImage;
+declare const modelData1: sagemaker.ModelData;
+declare const image2: sagemaker.ContainerImage;
+declare const modelData2: sagemaker.ModelData;
+declare const image3: sagemaker.ContainerImage;
+declare const modelData3: sagemaker.ModelData;
+
 const model = new sagemaker.Model(this, 'InferencePipelineModel', {
   containers: [
     { image: image1, modelData: modelData1 },
@@ -197,9 +204,12 @@ to each model. For example, suppose that you want to host two models, A and B, a
 traffic weight 2 for model A and 1 for model B. Amazon SageMaker distributes two-thirds of the
 traffic to Model A, and one-third to model B:
 
-```typescript fixture=with-assets
+```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
+declare const modelA: sagemaker.Model;
+declare const modelB: sagemaker.Model;
+
 const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
   productionVariants: [
     {
@@ -225,9 +235,11 @@ more information about the API, see the
 [InvokeEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_runtime_InvokeEndpoint.html)
 API. Defining an endpoint requires at minimum the associated endpoint configuration:
 
-```typescript fixture=with-endpoint-config
+```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
+declare const endpointConfig: sagemaker.EndpointConfig;
+
 const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
 ```
 
@@ -235,9 +247,11 @@ const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
 
 To enable autoscaling on the production variant, use the `autoScaleInstanceCount` method:
 
-```typescript fixture=with-endpoint-config
+```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
+declare const endpointConfig: sagemaker.EndpointConfig;
+
 const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
 const productionVariant = endpoint.findProductionVariant('variantName');
 const instanceCount = productionVariant.autoScaleInstanceCount({
@@ -256,9 +270,11 @@ this [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-sc
 The `IEndpointProductionVariant` interface also provides a set of APIs for referencing CloudWatch
 metrics associated with a production variant associated with an endpoint:
 
-```typescript fixture=with-endpoint-config
+```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
+declare const endpointConfig: sagemaker.EndpointConfig;
+
 const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
 const productionVariant = endpoint.findProductionVariant('variantName');
 productionVariant.metricModelLatency().createAlarm(this, 'ModelLatencyAlarm', {

From 20c34877bcab6cf0a93f405ea25db48fd743dbed Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 6 Sep 2022 13:14:02 -0700
Subject: [PATCH 13/30] Document EndpointConfig reuse across Endpoints

---
 text/0431-sagemaker-l2-endpoint.md | 31 ++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 4ac04d10e..63821de61 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -195,14 +195,15 @@ HTTPS endpoint where your machine learning model is available to provide inferen
 
 ### Endpoint Configuration
 
-In this configuration, you identify one or more models to deploy and the resources that you want
-Amazon SageMaker to provision. You define one or more production variants, each of which identifies
-a model. Each production variant also describes the resources that you want Amazon SageMaker to
-provision. This includes the number and type of ML compute instances to deploy. If you are hosting
-multiple models, you also assign a variant weight to specify how much traffic you want to allocate
-to each model. For example, suppose that you want to host two models, A and B, and you assign
-traffic weight 2 for model A and 1 for model B. Amazon SageMaker distributes two-thirds of the
-traffic to Model A, and one-third to model B:
+By using the `EndpointConfig` construct, you can define a set of endpoint configuration which can be
+used to provision one or more endpoints. In this configuration, you identify one or more models to
+deploy and the resources that you want Amazon SageMaker to provision. You define one or more
+production variants, each of which identifies a model. Each production variant also describes the
+resources that you want Amazon SageMaker to provision. This includes the number and type of ML
+compute instances to deploy. If you are hosting multiple models, you also assign a variant weight to
+specify how much traffic you want to allocate to each model. For example, suppose that you want to
+host two models, A and B, and you assign traffic weight 2 for model A and 1 for model B. Amazon
+SageMaker distributes two-thirds of the traffic to Model A, and one-third to model B:
 
 ```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
@@ -228,7 +229,7 @@ const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
 
 ### Endpoint
 
-If you create an endpoint from an `EndpointConfig`, Amazon SageMaker launches the ML compute
+When you create an endpoint from an `EndpointConfig`, Amazon SageMaker launches the ML compute
 instances and deploys the model or models as specified in the configuration. To get inferences from
 the model, client applications send requests to the Amazon SageMaker Runtime HTTPS endpoint. For
 more information about the API, see the
@@ -1092,7 +1093,17 @@ No.
    author did not create an `EndpointConfig` construct, instead hiding the resource's creation
    behind `Endpoint` (to which production variants could be added). Although a simplifier, this
    prevents customers from reusing configuration across endpoints. For this reason, an explicit
-   L2 construct for endpoint configuration was incorporated into this RFC.
+   L2 construct for endpoint configuration was incorporated into this RFC. This enables use-cases
+   like the following:
+    1. Producer A exposes ten endpoints, each unique to a different consumer (let's label these B
+       thru K).
+    1. Each of these endpoints could use one of, say, three endpoint configs (let's label these 1
+       thru 3) based on the features needed by each consumer.
+    1. Consumer B's endpoint is currently associated with endpoint config 1.
+    1. At some later point, consumer B wants to leverage a new feature, so in collaboration with the
+       consumer, producer A updates B's endpoint to reference endpoint config 3. As a result,
+       without switching endpoints, consumer B was able to begin using the features enabled via the
+       pre-built, shared endpoint config 3.
 
 [earliest-pr]: https://github.com/aws/aws-cdk/pull/2888
 

From 61723f923120c4ce5a022729128602b8c9784c78 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 6 Sep 2022 13:26:55 -0700
Subject: [PATCH 14/30] Reword ContainerImage unification as an open issue

---
 text/0431-sagemaker-l2-endpoint.md | 18 ++++++------------
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 63821de61..4af015241 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -1077,18 +1077,6 @@ No.
 
 ### What alternative solutions did you consider?
 
-1. Theoretically, the `ContainerImage` code (referenced [above](#container-image)) from the
-   `@aws-cdk/ecs` and `@aws-cdk/sagemaker` modules could be unified assuming it would be sufficient
-   for both use-cases to `bind` using an `IGrantable` (adjusting ECS's `TaskDefinition`
-   accordingly). However, it's unclear within which module such a unified API should reside as
-   support for private repositories makes it a bad fit for `@aws-cdk/ecr` and `@aws-cdk/ecr-assets`,
-   and it would be unintuitive for `@aws-cdk/sagemaker` to declare a dependency on `@aws-cdk/ecs`.
-
-   Package concerns aside, historically, there was a period during which SageMaker only supported
-   ECR as an image source while ECS was capable of sourcing images from either ECR or a
-   customer-owned private repository. Given the fact that these two products' supported images
-   sources may yet again diverge in the future, maybe it would be best to keep their
-   `ContainerImage` APIs separate within their respective modules.
 1. In the [earliest PR][earliest-pr] attempting to add SageMaker L2 constructs to the CDK, the
    author did not create an `EndpointConfig` construct, instead hiding the resource's creation
    behind `Endpoint` (to which production variants could be added). Although a simplifier, this
@@ -1164,6 +1152,12 @@ adjustments prior to marking the APIs as stable.
    regardless of whether stack deletion will succeed or fail but may hinder snapshot re-generation
    by subsequent CDK contributors. For this reason, it may be helpful to exclude VPC specification
    from the endpoint integration test at this time.
+1. This RFC proposes a new [`ContainerImage` API](#container-image) for the SageMaker module which
+   closely resembles the same-named API from the ECS module. The primary difference between the two
+   is that the ECS module's API `bind`s on an ECS `TaskDefinition` whereas this new SageMaker
+   module's API `bind`s on a SageMaker `Model`. There may be an opportunity to unify these APIs in
+   the future assuming that `bind`ing to a common type would sufficient for both use-cases (e.g.,
+   `IGrantable`).
 
 [lambda-eni-issue]: https://github.com/aws/aws-cdk/issues/12827
 [eks-eni-issue]: https://github.com/aws/aws-cdk/issues/9970

From 72c86a0bad18ce798fd10538edc86acd91808294 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 13 Sep 2022 15:18:58 -0700
Subject: [PATCH 15/30] Drop scope/id specification in ContainerImage API

---
 text/0431-sagemaker-l2-endpoint.md | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 4af015241..27ad1f30c 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -80,12 +80,14 @@ In the event that a single container is sufficient for your inference use-case,
 single-container model:
 
 ```typescript
+import * as ecr_assets from '@aws-cdk/aws-ecr-assets';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 import * as path from 'path';
 
-const image = sagemaker.ContainerImage.fromAsset(this, 'Image', {
+const imageAsset = new ecr_assets.DockerImageAsset(this, 'Image', {
   directory: path.join('path', 'to', 'Dockerfile', 'directory')
 });
+const image = sagemaker.ContainerImage.fromAsset(imageAsset);
 const modelData = sagemaker.ModelData.fromAsset(this, 'ModelData',
   path.join('path', 'to', 'artifact', 'file.tar.gz'));
 
@@ -138,12 +140,14 @@ abstract base class.
 Reference a local directory containing a Dockerfile:
 
 ```typescript
+import * as ecr_assets from '@aws-cdk/aws-ecr-assets';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 import * as path from 'path';
 
-const image = sagemaker.ContainerImage.fromAsset(this, 'Image', {
+const imageAsset = new ecr_assets.DockerImageAsset(this, 'Image', {
   directory: path.join('path', 'to', 'Dockerfile', 'directory')
 });
+const image = sagemaker.ContainerImage.fromAsset(imageAsset);
 ```
 
 #### ECR Image

From 98e875ef1c4246d254831745ef03e196dc1bb1a9 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 13 Sep 2022 15:25:50 -0700
Subject: [PATCH 16/30] Drop scope/id specification in ModelData API

---
 text/0431-sagemaker-l2-endpoint.md | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 27ad1f30c..656d613e7 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -81,6 +81,7 @@ single-container model:
 
 ```typescript
 import * as ecr_assets from '@aws-cdk/aws-ecr-assets';
+import * as s3_assets from '@aws-cdk/aws-s3-assets';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 import * as path from 'path';
 
@@ -88,8 +89,10 @@ const imageAsset = new ecr_assets.DockerImageAsset(this, 'Image', {
   directory: path.join('path', 'to', 'Dockerfile', 'directory')
 });
 const image = sagemaker.ContainerImage.fromAsset(imageAsset);
-const modelData = sagemaker.ModelData.fromAsset(this, 'ModelData',
-  path.join('path', 'to', 'artifact', 'file.tar.gz'));
+const modelDataAsset = new s3_assets.Asset(this, 'ModelData', {
+  path: path.join('path', 'to', 'artifact', 'file.tar.gz')
+});
+const modelData = sagemaker.ModelData.fromAsset(modelDataAsset);
 
 const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
   containers: [
@@ -173,11 +176,14 @@ base class. The default is to have no model artifacts associated with a model.
 Reference local model data:
 
 ```typescript
+import * as s3_assets from '@aws-cdk/aws-s3-assets';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 import * as path from 'path';
 
-const modelData = sagemaker.ModelData.fromAsset(this, 'ModelData',
-  path.join('path', 'to', 'artifact', 'file.tar.gz'));
+const modelDataAsset = new s3_assets.Asset(this, 'ModelData', {
+  path: path.join('path', 'to', 'artifact', 'file.tar.gz')
+});
+const modelData = sagemaker.ModelData.fromAsset(modelDataAsset);
 ```
 
 #### S3 Model Data

From 3fbea7b7a744db2ff5e527a0df707145786ed3c4 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Tue, 13 Sep 2022 16:22:07 -0700
Subject: [PATCH 17/30] Distinguish instance-based variants

---
 text/0431-sagemaker-l2-endpoint.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 656d613e7..d941266d1 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -209,11 +209,11 @@ By using the `EndpointConfig` construct, you can define a set of endpoint config
 used to provision one or more endpoints. In this configuration, you identify one or more models to
 deploy and the resources that you want Amazon SageMaker to provision. You define one or more
 production variants, each of which identifies a model. Each production variant also describes the
-resources that you want Amazon SageMaker to provision. This includes the number and type of ML
-compute instances to deploy. If you are hosting multiple models, you also assign a variant weight to
-specify how much traffic you want to allocate to each model. For example, suppose that you want to
-host two models, A and B, and you assign traffic weight 2 for model A and 1 for model B. Amazon
-SageMaker distributes two-thirds of the traffic to Model A, and one-third to model B:
+resources that you want Amazon SageMaker to provision. If you are hosting multiple models, you also
+assign a variant weight to specify how much traffic you want to allocate to each model. For example,
+suppose that you want to host two models, A and B, and you assign traffic weight 2 for model A and 1
+for model B. Amazon SageMaker distributes two-thirds of the traffic to Model A, and one-third to
+model B:
 
 ```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
@@ -222,7 +222,7 @@ declare const modelA: sagemaker.Model;
 declare const modelB: sagemaker.Model;
 
 const endpointConfig = new sagemaker.EndpointConfig(this, 'EndpointConfig', {
-  productionVariants: [
+  instanceProductionVariants: [
     {
       model: modelA,
       variantName: 'modelA',

From 66a662ffe91b8d0852dd0013689722ebd2dbd176 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 15 Sep 2022 12:55:27 -0700
Subject: [PATCH 18/30] Document open issues based on API evolution

---
 text/0431-sagemaker-l2-endpoint.md | 210 +++++++++++++++++++++++------
 1 file changed, 172 insertions(+), 38 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index d941266d1..627aa0392 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -1107,39 +1107,15 @@ No.
 
 ### What are the drawbacks of this solution?
 
-This RFC and its [original associated implementation PR][original-pr] were based on a Q3 2019
-feature set of SageMaker real-time inference endpoints. Since that point in time, SageMaker has
-launched the following features which would require further additions to the L2 API contracts:
-
-* [Multi-model endpoints][multi-model]
-* [Model Monitor][model-monitor]
-* [Asynchronous inference][async-inference]
-* [Deployment guardrails][deployment-guardrails]
-* [Serverless inference][serverless-inference]
-
-Although some of these changes would be small and additive (e.g., `DataCaptureConfig` for Model
-Monitor), features like asynchronous and serverless inference represent more significant shifts in
-functionality. For example, SageMaker hosts real-time inference endpoints on EC2 instances, meaning
-that CloudWatch alarms and Application Auto Scaling rules operate on instance-based metrics. In
-contrast, serverless inference does not expose any instance-based metrics nor does it yet support
-auto-scaling. Since both features are specified via the CloudFormation resource
-`AWS::SageMaker::EndpointConfig`, the current recommendation of this RFC would be to support the
-specification of both use-cases through a single L2 `EndointConfig` construct. However, this
-presents a challenge when modeling helper APIs like `metricCPUUtilization` or
-`autoScaleInstanceCount` on a related construct as those methods would not universally apply to
-*all* endpoint types.
-
-Given that (1) RFC reviewers may have idiomatic recommendations to solve such modeling challenges
-and (2) the current proposed constructs are still viable for creating a SageMaker endpoint, the
-first draft of this RFC is being published without further revisions accommodating the above newer
-SageMaker features.
-
-[original-pr]: https://github.com/aws/aws-cdk/pull/6107
-[multi-model]: https://aws.amazon.com/blogs/machine-learning/save-on-inference-costs-by-using-amazon-sagemaker-multi-model-endpoints/
-[model-monitor]: https://aws.amazon.com/about-aws/whats-new/2019/12/introducing-amazon-sagemaker-model-monitor/
-[async-inference]: https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-sagemaker-asynchronous-new-inference-option/
-[deployment-guardrails]: https://aws.amazon.com/about-aws/whats-new/2021/11/new-deployment-guardrails-amazon-sagemaker-inference-endpoints/
-[serverless-inference]: https://aws.amazon.com/about-aws/whats-new/2021/12/amazon-sagemaker-serverless-inference/
+Since production variants are configured via the `EndpointConfig` construct while the monitoring and
+auto-scaling of a deployed production variant is only possible once the `EndpointConfig` has
+been associated to an `Endpoint` (i.e., the dimension for most SageMaker model hosting metrics
+consists of endpoint name and production variant name), this RFC proposes the implementation of the
+function `Endpoint.findProductionVariant(string)`, the [return value for
+which](#endpoint-production-variants) contains `metric*` and `autoScaleInstanceCount` helper methods
+as demonstrated in the [README](#metrics). Although not necessarily a drawback, this separation of
+configuration-time and deploy-time APIs appears to be a novel pattern for the CDK, and thus, has the
+potential to be confusing to customers.
 
 ### What is the high-level project plan?
 
@@ -1151,8 +1127,163 @@ adjustments prior to marking the APIs as stable.
 
 ### Are there any open issues that need to be addressed later?
 
-1. Please see the [drawbacks section above](#what-are-the-drawbacks-of-this-solution) for potential
-   follow-on work (assuming it need not be incorporated into this RFC).
+#### Feature Additions
+
+The following list describes at a high-level future additions that can be made to the L2 constructs
+to enable SageMaker features not yet covered by this RFC but are already supported via
+CloudFormation. For the purposes of this RFC, this list should be reviewed to ensure that the
+proposed APIs are appropriately extensible in order to support these use-cases.
+
+1. `AWS::SageMaker::EndpointConfig` features:
+    1. [Serverless Inference][serverless-inference]: By default, upon endpoint deployment,
+       SageMaker will provision EC2 instances (managed by SageMaker) for hosting purposes. To shield
+       customers from the complexity of forecasting fleet sizes, the `ServerlessConfig` attribute
+       was added to the `ProductionVariant` CloudFormation structure of an endpoint config resource.
+       This configuration removes the need for customers to specify instance-specific settings
+       (e.g., instance count, instance type), abstracting the runtime compute from customers, much
+       in the same way Lambda does for its customers. In preparation for the addition of this
+       feature into the CDK, all concrete production variant related classes and attributes have
+       been prefixed with the string `[Ii]nstance` to designate that they are only associated with
+       instance-based hosting. When later adding serverless support to the SageMaker module,
+       `[Ss]erverless`-prefixed analogs can be created with attributes appropriate for the use-case
+       with appropriate plumbing to the L1 constructs.
+    1. [Asynchronous Inference][async-inference]: By default, a deployed endpoint is synchronous:
+       a customer issues an InvokeEndpoint operation to SageMaker with an attached input payload and
+       the resulting response contains the output payload from the endpoint. To instead support
+       asynchronous invocation, the `AsyncInferenceClientConfig` CloudFormation attribute was added
+       to the endpoint config resource. To interact with an asynchronous endpoint, a customer issues
+       an InvokeEndpointAsync operation to SageMaker with an attached input location in S3;
+       SageMaker asynchronously reads the input from S3, invokes the endpoint, and writes the output
+       to an S3 location specified within the `AsyncInferenceClientConfig` attribute. As [discussed
+       with the RFC bar raiser here][async-conversation], there are a few ways to tackle the
+       addition of this functionlity. One option is to add attribute(s) to the L2 endpoint config
+       construct to support asynchronous inference along with synthesis-time error handling to
+       catch configuration conflicts (e.g., asynchronous endpoints are only capable of supporting
+       a single instance-based production variant today). Alternatively, an `AsyncEndpointConfig`
+       subclass of `EndpointConfig` could be introduced to provide a better compile-time contract
+       to customers (while still implementing the generic functionality within `EndpointConfig`).
+       Either way, the proposed contracts would only undergo backward-compatible changes.
+    1. [Model Monitoring][model-monitor]: For the purposes of monitoring model performance, the
+       `DataCaptureConfig` CloudFormation attribute was added which allows customers to configure a
+       sampling rate of input and/or output endpoint requests that SageMaker should publish to an S3
+       destination. This functionlity is a side-effect of normal endpoint operation and has no
+       bearing on other construct APIs, meaning its addition should be confined to new attribute(s)
+       on the endpoint config construct.
+1. `AWS::SageMaker::Endpoint` features:
+    1. [Retention of Variant Properties][retain-variant-properties]: Once an endpoint has been
+       deployed, the desired instance count and desired weight can be dynamically adjusted _per
+       production variant_ without changing the backing endpoint config resource. These changes can
+       either be made automatically via Application Auto Scaling or manually by the customer via
+       the SageMaker UpdateEndpointWeightsAndCapacities operation. After making such changes, by
+       default, when updating a SageMaker endpoint to use a new endpoint config resource (such as
+       when making a CloudFormation change an endpoint config that results in resource replacement),
+       the desired instance count and desired weight is reset to match the new endpoint config
+       resource. To bypass this resetting of variant properties, the `RetainAllVariantProperties`
+       boolean flag was added to the endpoint resource, which when set to true, will not reset these
+       variant properties. In addition to this field, `ExcludeRetainedVariantProperties` was also
+       added to the endpoint resource to allow for selective retention of variant properties (e.g.,
+       keeping the desired instance count while resetting the desired weight). As the default
+       behavior is already in place (no retention), adding the functionality should consist of
+       incorporating new attribute(s) on the Endpoint L2 construct's props interface and plumbing
+       it through to the underlying L1 resource definition.
+    1. [Deployment Guardrails][deployment-guardrails]: By default, when updating an endpoint,
+       SageMaker uses an all-at-once blue/green deployment strategy: a new fleet is provisioned
+       with the new approrpriate configuration, and upon successful provisioning, the traffic is
+       flipped and the old fleet is terminated. To augment this functionality, the
+       `DeploymentConfig` attribute was added to the Endpoint resource which now allows (1) the
+       specification of a CloudWatch alarm for auto-rollback and (2) additional deployment policies
+       beyond all-at-once, including canary and linear deployment strategies (along with more fine-
+       grained timing settings). Adding this functionlity should consist of incorporating new
+       attribute(s) on the Endpoint L2 construct's props interface and plumbing it through to the
+       underlying L1 resource definition. This work should also include support for the
+       `RetainDeploymentConfig` boolean flag which controls whether to reuse the previous deployment
+       configuration or use the new one. Note, there are a number of [SageMaker features which
+       prevent the use of deployment configuration][deployment-guardrails-exclusions], so defending
+       against combinations of features may improve the customer experience with the Endpoint
+       construct.
+1. `AWS::SageMaker::Model` features:
+    1. [Multi-Model Endpoints][multi-model]: By default (and as [described in the technical solution
+       above](#model-data)), SageMaker expects the model data URL on each container to point to an
+       S3 object containing a gzipped tar file of artifacts, which will be automatically extracted
+       upon instance provisioning. To support colocation of multiple logical models into a single
+       container, the `Mode` attribute was added to the `ContainerDefinition` CloudFormation
+       structure to either explicit configure `SingleModel` mode (the default) or `MultiModel` mode.
+       In multi-model mode, SageMaker now expects the customer configured model data URL to point to
+       an S3 path under which multiple gzipped tar files exist. When invoking a multi-model
+       endpoint, the client invoking the endpoint must specify the target model representing the
+       exact S3 path suffix pointing to a specific gzipped tar file. To accommodate this feature,
+       the proposed `ModelData.fromAsset` API should be adjusted to support zip file assets capable
+       of containing one or more gzipped tar files within them. Even though the code need not be
+       aware of `.tar.gz` files specifically, it might prove a better customer experience to at
+       least put up guard rails to prevent zip file assets from being used in single model mode
+       where as multi-model mode could be more permissive.
+    1. [Direct Invocation of Multi-Container Endpoints][multi-container]: By default (and as
+       [described in the proposed README](#inference-pipeline-model)), when a customer specifies
+       multiple containers for a model, the containers are treated as an inference pipeline (also
+       referred to as a serial pipeline). This means that the containers are treated as an ordered
+       list, wherein the output of one container at runtime is passed as input to the next. Only the
+       output from the last container is surfaced to the client invoking the model. To support a
+       different invocation paradigm, the `InferenceExecutionConfig` structure was added to the
+       model CloudFormation resource which allows customers to either explicitly configure `Serial`
+       invocation mode (the default, as an inference pipeline) or the new `Direct` invocation mode.
+       When using direct mode, a client invoking an endpoint must specify a container to target with
+       their request; SageMaker then invokes only that single container. As SageMaker exposes a new
+       dimension for CloudWatch metrics specific to each directly-invokable container, other than
+       exposing a new inference execution mode attribute on the `Model` construct, this feature
+       would likely also warrant the addition of a `findContainer(containerHostName: string)` method
+       to [`IEndpointProductionVariant`](#endpoint-production-variants) which will return a new
+       interface on which additional `metric*` APIs are present for generating CloudWatch metrics
+       against the dimension consisting of endpoint, variant, and container combined.
+    1. [Private Docker Registries][private-docker]: The `ImageConfig` type was added to the existing
+       `ContainerDefinition` CloudFormation structure in order for customers to specify that a
+       VPC-connected Docker registry will act as the source of the container's image (as opposed to
+       ECR which acts as the default platform repository). This new type also contains an optional
+       `RepositoryAuthConfig` nested structure in order to specify the ARN of a Lambda function
+       capable of serving repository credentials to SageMaker. In order to deliver this
+       functionality in a backward-compatible way, inspiration can be taken from [ECS's
+       `ContainerImage.fromRegistry` API][container-image-from-registry] (note though, ECS sources
+       credentials from Secrets Manager rather than Lambda) in order to make the following
+       additions to the SageMaker module:
+         1. Add attributes to `ContainerImageConfig` to support the specification of a non-platform
+            repository along with an optional Lambda function ARN.
+         1. Implement a new, non-exported `RegistryImage` subclass of `ContainerImage` whose
+            constructor takes an optional Lambda `IFunction` instance for generating a
+            `ContainerImageConfig` instance with the appropriate Lambda function ARN for serving
+            credentials.
+         1. On `ContainerImage`, add a new static `fromRegistry` method which takes a props object
+            consisting of an optional Lambda `IFunction` instance. This method acts as a simple
+            static factory method for the non-exported `RegistryImage` class.
+    1. [Network Isolation][network-isolation]: The `EnableNetworkIsolation` Cloudformation boolean
+       flag (defaults to false) on a model resource prevents inbound and outbound network calls
+       to/from the model container. Incorporating such an attribute into the Model L2 construct
+       should not conflict with any proposed API.
+    1. [AWS Marketplace Models][marketplace-models]: The `ModelPackageName` string attribute was
+       added to the `ContainerDefinition` CloudFormation structure to specify the ARN of a reusable,
+       versioned model which can be listed on the AWS Marketplace. When creating a `Model` resource
+       from a model package, the customer need no longer specify a container image as the model
+       package contains all information about the underlying container(s) required for inference. To
+       incorporate this support into the SageMaker module, it would likely entail creating a new L2
+       construct `ModelPackage` to represent the `AWS::SageMaker::ModelPackage` CloudFormation
+       resource and modifying the proposed `ContainerDefinition` interface to support an optional
+       `IModelPackage` as an attribute (while making `image: ContainerImage` an optional attribute).
+
+[serverless-inference]: https://aws.amazon.com/about-aws/whats-new/2021/12/amazon-sagemaker-serverless-inference/
+[async-inference]: https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-sagemaker-asynchronous-new-inference-option/
+[model-monitor]: https://aws.amazon.com/about-aws/whats-new/2019/12/introducing-amazon-sagemaker-model-monitor/
+[retain-variant-properties]: https://aws.amazon.com/blogs/machine-learning/configuring-autoscaling-inference-endpoints-in-amazon-sagemaker/
+[deployment-guardrails]: https://aws.amazon.com/about-aws/whats-new/2021/11/new-deployment-guardrails-amazon-sagemaker-inference-endpoints/
+[multi-model]: https://aws.amazon.com/blogs/machine-learning/save-on-inference-costs-by-using-amazon-sagemaker-multi-model-endpoints/
+[multi-container]: https://aws.amazon.com/blogs/machine-learning/deploy-multiple-serving-containers-on-a-single-instance-using-amazon-sagemaker-multi-container-endpoints/
+[private-docker]: https://aws.amazon.com/about-aws/whats-new/2021/03/amazon-sagemaker-now-supports-private-docker-registry-authentication/
+[network-isolation]: https://aws.amazon.com/blogs/security/secure-deployment-of-amazon-sagemaker-resources/
+[marketplace-models]: https://aws.amazon.com/blogs/awsmarketplace/using-amazon-augmented-ai-with-aws-marketplace-machine-learning-models/
+
+[async-conversation]: https://github.com/aws/aws-cdk-rfcs/pull/433#discussion_r952949608
+[deployment-guardrails-exclusions]: https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails-exclusions.html
+[container-image-from-registry]: https://github.com/aws/aws-cdk/blob/v1-main/packages/%40aws-cdk/aws-ecs/lib/container-image.ts#L14-L19
+
+#### Rough Edges
+
 1. As observed with [Lambda][lambda-eni-issue] and [EKS][eks-eni-issue], the Elastic Network
    Interfaces (ENIs) associated with a SageMaker model's VPC are not always cleaned up in a timely
    manner after downstream compute resources are deleted. As a result, attempts to delete a
@@ -1162,6 +1293,12 @@ adjustments prior to marking the APIs as stable.
    regardless of whether stack deletion will succeed or fail but may hinder snapshot re-generation
    by subsequent CDK contributors. For this reason, it may be helpful to exclude VPC specification
    from the endpoint integration test at this time.
+
+[lambda-eni-issue]: https://github.com/aws/aws-cdk/issues/12827
+[eks-eni-issue]: https://github.com/aws/aws-cdk/issues/9970
+
+#### Cross-module API Convergence
+
 1. This RFC proposes a new [`ContainerImage` API](#container-image) for the SageMaker module which
    closely resembles the same-named API from the ECS module. The primary difference between the two
    is that the ECS module's API `bind`s on an ECS `TaskDefinition` whereas this new SageMaker
@@ -1169,9 +1306,6 @@ adjustments prior to marking the APIs as stable.
    the future assuming that `bind`ing to a common type would sufficient for both use-cases (e.g.,
    `IGrantable`).
 
-[lambda-eni-issue]: https://github.com/aws/aws-cdk/issues/12827
-[eks-eni-issue]: https://github.com/aws/aws-cdk/issues/9970
-
 ## Appendix
 
 Feel free to add any number of appendices as you see fit. Appendices are

From 7bdb8fb89913f08b6a042dc7ab1ce2093af58625 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 15 Sep 2022 14:03:21 -0700
Subject: [PATCH 19/30] Distinguish instance-based variants for Endpoints

---
 text/0431-sagemaker-l2-endpoint.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 627aa0392..64154208c 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -264,7 +264,7 @@ import * as sagemaker from '@aws-cdk/aws-sagemaker';
 declare const endpointConfig: sagemaker.EndpointConfig;
 
 const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
-const productionVariant = endpoint.findProductionVariant('variantName');
+const productionVariant = endpoint.findInstanceProductionVariant('variantName');
 const instanceCount = productionVariant.autoScaleInstanceCount({
   maxCapacity: 3
 });
@@ -287,7 +287,7 @@ import * as sagemaker from '@aws-cdk/aws-sagemaker';
 declare const endpointConfig: sagemaker.EndpointConfig;
 
 const endpoint = new sagemaker.Endpoint(this, 'Endpoint', { endpointConfig });
-const productionVariant = endpoint.findProductionVariant('variantName');
+const productionVariant = endpoint.findInstanceProductionVariant('variantName');
 productionVariant.metricModelLatency().createAlarm(this, 'ModelLatencyAlarm', {
   threshold: 100000,
   evaluationPeriods: 3,

From 799d2779ecfc91cd102060ac0d52b5fbc4c9ca89 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 15 Sep 2022 14:08:41 -0700
Subject: [PATCH 20/30] Drop README mention of IEndpointProductionVariant

---
 text/0431-sagemaker-l2-endpoint.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 64154208c..eb305585e 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -278,8 +278,8 @@ this [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-sc
 
 ### Metrics
 
-The `IEndpointProductionVariant` interface also provides a set of APIs for referencing CloudWatch
-metrics associated with a production variant associated with an endpoint:
+To monitor CloudWatch metrics for a production variant, use one or more of the metric convenience
+methods:
 
 ```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';

From 034cd4a1dd782323fc44fa1b8e233388195d0492 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 15 Sep 2022 15:16:01 -0700
Subject: [PATCH 21/30] Sync technical solution API with implementation

---
 text/0431-sagemaker-l2-endpoint.md | 206 ++++++++++++++++++-----------
 1 file changed, 127 insertions(+), 79 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index eb305585e..0e411b7a5 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -611,11 +611,9 @@ upon an ECS task definition, instead operate upon a SageMaker model.
     /**
      * Reference an image that's constructed directly from sources on disk
      *
-     * @param scope The scope within which to create the image asset
-     * @param id The id to assign to the image asset
-     * @param props The properties of a Docker image asset
+     * @param asset A Docker image asset
      */
-    public static fromAsset(scope: Construct, id: string, props: assets.DockerImageAssetProps): ContainerImage { ... }
+    public static fromAsset(asset: assets.DockerImageAsset): ContainerImage { ... }
 
     /**
      * Called when the image is used by a Model
@@ -656,11 +654,9 @@ artifacts, either in an S3 bucket or a local file asset.
 
     /**
      * Constructs model data that will be uploaded to S3 as part of the CDK app deployment.
-     * @param scope The scope within which to create a new asset
-     * @param id The id to associate with the new asset
-     * @param path The local path to a model artifact file as a gzipped tar file
+     * @param asset An S3 asset as a gzipped tar file
      */
-    public static fromAsset(scope: Construct, id: string, path: string): ModelData { ... }
+    public static fromAsset(asset: assets.Asset): ModelData { ... }
 
     /**
      * This method is invoked by the SageMaker Model construct when it needs to resolve the model
@@ -713,16 +709,16 @@ artifacts, either in an S3 bucket or a local file asset.
     readonly encryptionKey?: kms.IKey;
 
     /**
-     * A list of production variants. You can always add more variants later by calling
-     * {@link EndpointConfig#addProductionVariant}.
+     * A list of instance production variants. You can always add more variants later by calling
+     * {@link EndpointConfig#addInstanceProductionVariant}.
      *
      * @default - none
      */
-    readonly productionVariants?: ProductionVariantProps[];
+    readonly instanceProductionVariants?: InstanceProductionVariantProps[];
   }
   ```
 
-- `EndpointConfig` -- defines a SageMaker EndpointConfig  (with helper methods for importing an
+- `EndpointConfig` -- defines a SageMaker EndpointConfig (with helper methods for importing an
   endpoint config)
 
   ```ts
@@ -747,22 +743,22 @@ artifacts, either in an S3 bucket or a local file asset.
     constructor(scope: Construct, id: string, props: EndpointConfigProps = {}) { ... }
 
     /**
-     * Add production variant to the endpoint configuration.
+     * Add instance production variant to the endpoint configuration.
      *
      * @param props The properties of a production variant to add.
      */
-    public addProductionVariant(props: ProductionVariantProps): void { ... }
+    public addInstanceProductionVariant(props: InstanceProductionVariantProps): void { ... }
 
     /**
-     * Get production variants associated with endpoint configuration.
+     * Get instance production variants associated with endpoint configuration.
      */
-    public get productionVariants(): ProductionVariant[] { ... }
+    public get instanceProductionVariants(): InstanceProductionVariant[] { ... }
 
     /**
-     * Find production variant based on variant name
+     * Find instance production variant based on variant name
      * @param name Variant name from production variant
      */
-    public findProductionVariant(name: string): ProductionVariant { ... }
+    public findInstanceProductionVariant(name: string): InstanceProductionVariant { ... }
   }
   ```
 
@@ -772,23 +768,11 @@ To accommodate A/B testing of model behaviors, an endpoint config supports the s
 multiple production variants. Each variant's weight determines the traffic distribution to itself
 relative to the other configured variants.
 
-- `ProductionVariantProps` -- construction properties for a production variant
+- `ProductionVariantProps` -- common construction properties for all production variant types (e.g.,
+  instance, serverless) (note, not exported)
 
   ```ts
-  export interface ProductionVariantProps {
-    /**
-     * The size of the Elastic Inference (EI) instance to use for the production variant. EI instances
-     * provide on-demand GPU computing for inference.
-     *
-     * @default - none
-     */
-    readonly acceleratorType?: AcceleratorType;
-    /**
-     * Number of instances to launch initially.
-     *
-     * @default 1
-     */
-    readonly initialInstanceCount?: number;
+  interface ProductionVariantProps {
     /**
      * Determines initial traffic distribution among all of the models that you specify in the
      * endpoint configuration. The traffic to a production variant is determined by the ratio of the
@@ -797,12 +781,6 @@ relative to the other configured variants.
      * @default 1.0
      */
     readonly initialVariantWeight?: number;
-    /**
-     * Instance type of the production variant.
-     *
-     * @default - ml.t2.medium instance type.
-     */
-    readonly instanceType?: ec2.InstanceType;
     /**
      * The model to host.
      */
@@ -814,61 +792,123 @@ relative to the other configured variants.
   }
   ```
 
-- `ProductionVariant` -- represents a production variant that has been associated with an
-  `EndpointConfig`
+- `InstanceProductionVariantProps` -- construction properties for an instance production variant
 
   ```ts
-  export interface ProductionVariant {
+  export interface InstanceProductionVariantProps extends ProductionVariantProps {
     /**
-     * The size of the Elastic Inference (EI) instance to use for the production variant. EI instances
-     * provide on-demand GPU computing for inference.
-     *
-     * @default - none
-     */
+    * The size of the Elastic Inference (EI) instance to use for the production variant. EI instances
+    * provide on-demand GPU computing for inference.
+    *
+    * @default - none
+    */
     readonly acceleratorType?: AcceleratorType;
     /**
-     * Number of instances to launch initially.
-     */
-    readonly initialInstanceCount: number;
+    * Number of instances to launch initially.
+    *
+    * @default 1
+    */
+    readonly initialInstanceCount?: number;
+    /**
+    * Instance type of the production variant.
+    *
+    * @default - ml.t2.medium instance type.
+    */
+    readonly instanceType?: InstanceType;
+  }
+  ```
+
+- `ProductionVariant` -- represents common attributes of all production variant types (e.g.,
+  instance, serverless) once associated to an EndpointConfig (note, not exported)
+
+  ```ts
+  interface ProductionVariant {
     /**
      * Determines initial traffic distribution among all of the models that you specify in the
-     * endpoint configuration. The traffic to a production variant is determined by the ratio of the
-     * variant weight to the sum of all variant weight values across all production variants.
-     */
+    * endpoint configuration. The traffic to a production variant is determined by the ratio of the
+    * variant weight to the sum of all variant weight values across all production variants.
+    */
     readonly initialVariantWeight: number;
-    /**
-     * Instance type of the production variant.
-     */
-    readonly instanceType: ec2.InstanceType;
     /**
      * The name of the model to host.
-     */
+    */
     readonly modelName: string;
     /**
      * The name of the production variant.
-     */
+    */
     readonly variantName: string;
   }
   ```
 
-- `AcceleratorType` -- an enumeration of values representing the size of the Elastic Inference (EI)
-  instance to use for the production variant. EI instances provide on-demand GPU computing for
+- `InstanceProductionVariant` -- represents an instance production variant that has been associated
+  with an `EndpointConfig`
+
+  ```ts
+  export interface InstanceProductionVariant extends ProductionVariant {
+    /**
+    * The size of the Elastic Inference (EI) instance to use for the production variant. EI instances
+    * provide on-demand GPU computing for inference.
+    *
+    * @default - none
+    */
+    readonly acceleratorType?: AcceleratorType;
+    /**
+    * Number of instances to launch initially.
+    */
+    readonly initialInstanceCount: number;
+    /**
+    * Instance type of the production variant.
+    */
+    readonly instanceType: InstanceType;
+  }
+  ```
+
+- `AcceleratorType` -- enum-like class of supported Elastic Inference (EI) instance types for
+  SageMaker instance-based production variants; EI instances provide on-demand GPU computing for
   inference
 
   ```ts
-  export enum AcceleratorType {
+  export class AcceleratorType {
+    /**
+     * ml.eia1.large
+     */
+    public static readonly EIA1_LARGE = AcceleratorType.of('ml.eia1.large');
+
+    /* Additional supported accelerator types */
+
     /**
-     * Medium accelerator type.
+     * Builds an AcceleratorType from a given string or token (such as a CfnParameter).
+     * @param acceleratorType An accelerator type as string
+     * @returns A strongly typed AcceleratorType
      */
-    MEDIUM = 'ml.eia1.medium',
+    public static of(acceleratorType: string): AcceleratorType;
+
     /**
-     * Large accelerator type.
+     * Return the accelerator type as a string
+     * @returns The accelerator type as a string
      */
-    LARGE = 'ml.eia1.large ',
+    public toString(): string;
+  }
+  ```
+
+- `InstanceType` -- enum-like class of supported instance types for SageMaker instance-based
+  production variants
+
+  ```ts
+  export class InstanceType {
     /**
-     * Extra large accelerator type.
+     * ml.c4.2xlarge
      */
-    XLARGE = 'ml.eia1.xlarge',
+    public static readonly C4_2XLARGE = InstanceType.of('ml.c4.2xlarge');
+
+    /* Additional supported instance types */
+
+    /**
+     * Builds an InstanceType from a given string or token (such as a CfnParameter).
+     * @param instanceType An instance type as string
+     * @returns A strongly typed InstanceType
+     */
+    public static of(instanceType: string): InstanceType;
   }
   ```
 
@@ -975,15 +1015,15 @@ relative to the other configured variants.
     constructor(scope: Construct, id: string, props: EndpointProps) { ... }
 
     /**
-     * Get production variants associated with endpoint.
+     * Get instance production variants associated with endpoint.
      */
-    public get productionVariants(): IEndpointProductionVariant[] { ... }
+    public get instanceProductionVariants(): IEndpointInstanceProductionVariant[] { ... }
 
     /**
-     * Find production variant based on variant name
+     * Find instance production variant based on variant name
      * @param name Variant name from production variant
      */
-    public findProductionVariant(name: string): IEndpointProductionVariant { ... }
+    public findInstanceProductionVariant(name: string): IEndpointInstanceProductionVariant { ... }
   }
   ```
 
@@ -991,15 +1031,15 @@ relative to the other configured variants.
 
 When monitoring or auto-scaling real-time inference endpoints, both CloudWatch and Application Auto
 Scaling operate at the level of endpoint name + variant name. For this reason, once a variant has
-been attached to an endpoint, this RFC allows customers to retrieve `IEndpointProductionVariant`
-instances from their endpoint for the purposes of referencing CloudWatch metrics or an Application
-Auto Scaling `BaseScalableAttribute`.
+been attached to an endpoint, this RFC allows customers to retrieve
+`IEndpointInstanceProductionVariant` object instances from their endpoint for the purposes of
+referencing CloudWatch metrics or an Application Auto Scaling `BaseScalableAttribute`.
 
-- `IEndpointProductionVariant` -- represents a production variant that has been associated with an
-  endpoint
+- `IEndpointProductionVariant` -- represents the features common to all production variant types
+  (e.g., instance, serverless) that have been associated with an endpoint (note, not exported)
 
   ```ts
-  export interface IEndpointProductionVariant {
+  interface IEndpointProductionVariant {
     /**
      * The name of the production variant.
      */
@@ -1010,6 +1050,14 @@ Auto Scaling `BaseScalableAttribute`.
      * @default - sum over 5 minutes
      */
     metric(namespace: string, metricName: string, props?: cloudwatch.MetricOptions): cloudwatch.Metric;
+  }
+  ```
+
+- `IEndpointInstanceProductionVariant` -- represents an instance production variant that has been
+  associated with an endpoint
+
+  ```ts
+  export interface IEndpointInstanceProductionVariant extends IEndpointProductionVariant {
     /**
      * Metric for the number of invocations
      *
@@ -1078,7 +1126,7 @@ Auto Scaling `BaseScalableAttribute`.
     autoScaleInstanceCount(scalingProps: appscaling.EnableScalingProps): ScalableInstanceCount;
   }
 
-  class EndpointProductionVariant implements IEndpointProductionVariant { ... }
+  class EndpointInstanceProductionVariant implements IEndpointInstanceProductionVariant { ... }
   ```
 
 ### Is this a breaking change?

From 619d46214a446c6becddbb0bfb70af6264bc2a14 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 15 Sep 2022 15:27:50 -0700
Subject: [PATCH 22/30] Expand serverless evolution

---
 text/0431-sagemaker-l2-endpoint.md | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 0e411b7a5..b4c327015 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -1194,7 +1194,15 @@ proposed APIs are appropriately extensible in order to support these use-cases.
        been prefixed with the string `[Ii]nstance` to designate that they are only associated with
        instance-based hosting. When later adding serverless support to the SageMaker module,
        `[Ss]erverless`-prefixed analogs can be created with attributes appropriate for the use-case
-       with appropriate plumbing to the L1 constructs.
+       with appropriate plumbing to the L1 constructs. Note, there are a [number of features which
+       do not yet work with serverless variants][serverless-exclusions], so it may be necessary to
+       incorporate a number of new synthesis-time checks or compile-time contracts to guard against
+       mixing incompatible features. For example, as [discussed with the bar
+       raiser][design-conversation], alongside the proposed `EndpointConfigProps` attribute
+       `instanceProductionVariants?: InstanceProductionVariantProps[]`, a new mutually exclusive
+       attribute `serverlessProductionVariant?: ServerlessProductionVariantProps` (as only a single
+       variant is supported with serverless inference) could be added with a synthesis-time check
+       confirming that the customer hasn't configured both instance-based and serverless production variants.
     1. [Asynchronous Inference][async-inference]: By default, a deployed endpoint is synchronous:
        a customer issues an InvokeEndpoint operation to SageMaker with an attached input payload and
        the resulting response contains the output payload from the endpoint. To instead support
@@ -1203,7 +1211,7 @@ proposed APIs are appropriately extensible in order to support these use-cases.
        an InvokeEndpointAsync operation to SageMaker with an attached input location in S3;
        SageMaker asynchronously reads the input from S3, invokes the endpoint, and writes the output
        to an S3 location specified within the `AsyncInferenceClientConfig` attribute. As [discussed
-       with the RFC bar raiser here][async-conversation], there are a few ways to tackle the
+       with the RFC bar raiser here][design-conversation], there are a few ways to tackle the
        addition of this functionlity. One option is to add attribute(s) to the L2 endpoint config
        construct to support asynchronous inference along with synthesis-time error handling to
        catch configuration conflicts (e.g., asynchronous endpoints are only capable of supporting
@@ -1326,7 +1334,8 @@ proposed APIs are appropriately extensible in order to support these use-cases.
 [network-isolation]: https://aws.amazon.com/blogs/security/secure-deployment-of-amazon-sagemaker-resources/
 [marketplace-models]: https://aws.amazon.com/blogs/awsmarketplace/using-amazon-augmented-ai-with-aws-marketplace-machine-learning-models/
 
-[async-conversation]: https://github.com/aws/aws-cdk-rfcs/pull/433#discussion_r952949608
+[serverless-exclusions]: https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html#serverless-endpoints-how-it-works-exclusions
+[design-conversation]: https://github.com/aws/aws-cdk-rfcs/pull/433#discussion_r952949608
 [deployment-guardrails-exclusions]: https://docs.aws.amazon.com/sagemaker/latest/dg/deployment-guardrails-exclusions.html
 [container-image-from-registry]: https://github.com/aws/aws-cdk/blob/v1-main/packages/%40aws-cdk/aws-ecs/lib/container-image.ts#L14-L19
 

From 80151f9ed7b985c144512182ce0e65371d9c9c26 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Mon, 3 Oct 2022 17:03:58 -0700
Subject: [PATCH 23/30] Create assets behind fromAsset APIs for image/data

---
 text/0431-sagemaker-l2-endpoint.md | 35 +++++++++---------------------
 1 file changed, 10 insertions(+), 25 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index b4c327015..333f17f8f 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -80,19 +80,11 @@ In the event that a single container is sufficient for your inference use-case,
 single-container model:
 
 ```typescript
-import * as ecr_assets from '@aws-cdk/aws-ecr-assets';
-import * as s3_assets from '@aws-cdk/aws-s3-assets';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 import * as path from 'path';
 
-const imageAsset = new ecr_assets.DockerImageAsset(this, 'Image', {
-  directory: path.join('path', 'to', 'Dockerfile', 'directory')
-});
-const image = sagemaker.ContainerImage.fromAsset(imageAsset);
-const modelDataAsset = new s3_assets.Asset(this, 'ModelData', {
-  path: path.join('path', 'to', 'artifact', 'file.tar.gz')
-});
-const modelData = sagemaker.ModelData.fromAsset(modelDataAsset);
+const image = sagemaker.ContainerImage.fromAsset(path.join('path', 'to', 'Dockerfile', 'directory'));
+const modelData = sagemaker.ModelData.fromAsset(path.join('path', 'to', 'artifact', 'file.tar.gz'));
 
 const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
   containers: [
@@ -143,14 +135,10 @@ abstract base class.
 Reference a local directory containing a Dockerfile:
 
 ```typescript
-import * as ecr_assets from '@aws-cdk/aws-ecr-assets';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 import * as path from 'path';
 
-const imageAsset = new ecr_assets.DockerImageAsset(this, 'Image', {
-  directory: path.join('path', 'to', 'Dockerfile', 'directory')
-});
-const image = sagemaker.ContainerImage.fromAsset(imageAsset);
+const image = sagemaker.ContainerImage.fromAsset(path.join('path', 'to', 'Dockerfile', 'directory'));
 ```
 
 #### ECR Image
@@ -176,14 +164,10 @@ base class. The default is to have no model artifacts associated with a model.
 Reference local model data:
 
 ```typescript
-import * as s3_assets from '@aws-cdk/aws-s3-assets';
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
 import * as path from 'path';
 
-const modelDataAsset = new s3_assets.Asset(this, 'ModelData', {
-  path: path.join('path', 'to', 'artifact', 'file.tar.gz')
-});
-const modelData = sagemaker.ModelData.fromAsset(modelDataAsset);
+const modelData = sagemaker.ModelData.fromAsset(path.join('path', 'to', 'artifact', 'file.tar.gz'));
 ```
 
 #### S3 Model Data
@@ -610,10 +594,10 @@ upon an ECS task definition, instead operate upon a SageMaker model.
 
     /**
      * Reference an image that's constructed directly from sources on disk
-     *
-     * @param asset A Docker image asset
+     * @param directory The directory where the Dockerfile is stored
+     * @param options The options to further configure the selected image
      */
-    public static fromAsset(asset: assets.DockerImageAsset): ContainerImage { ... }
+    public static fromAsset(directory: string, options: assets.DockerImageAssetOptions = {}): ContainerImage { ... }
 
     /**
      * Called when the image is used by a Model
@@ -654,9 +638,10 @@ artifacts, either in an S3 bucket or a local file asset.
 
     /**
      * Constructs model data that will be uploaded to S3 as part of the CDK app deployment.
-     * @param asset An S3 asset as a gzipped tar file
+     * @param path The local path to a model artifact file as a gzipped tar file
+     * @param options The options to further configure the selected asset
      */
-    public static fromAsset(asset: assets.Asset): ModelData { ... }
+    public static fromAsset(path: string, options: assets.AssetOptions = {}): ModelData { ... }
 
     /**
      * This method is invoked by the SageMaker Model construct when it needs to resolve the model

From 8faf9397df46bd3ce565e9bd9b1fd43094b5fa40 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Mon, 3 Oct 2022 17:29:11 -0700
Subject: [PATCH 24/30] Trim README content in favor of links to AWS docs

---
 text/0431-sagemaker-l2-endpoint.md | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 333f17f8f..4b71a9e67 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -62,17 +62,11 @@ import * as sagemaker from '@aws-cdk/aws-sagemaker';
 
 ## Model
 
-In machine learning, a model is used to make predictions, or inferences. A deployable model in
-SageMaker consists of inference code and model artifacts. Model artifacts are the results of model
-training by using a machine learning algorithm. The inference code must be packaged in a Docker
-container and stored in Amazon ECR. You can either package the model artifacts in the same container
-as the inference code, or store them in Amazon S3. As model artifacts may change each time a new
-model is trained (while the inference code may remain static), artifact separation in S3 may act as
-a natural decoupling point for your application.
-
-When instantiating the `Model` construct, you tell Amazon SageMaker where it can find these model
-components. The `ContainerDefinition` interface encapsulates both the specification of model
-inference code as a `ContainerImage` and an optional set of separate artifacts as `ModelData`.
+To create a machine learning model with Amazon Sagemaker, use the `Model` construct. This construct
+includes properties that can be configured to define model components, including the model inference
+code as a Docker image and an optional set of separate model data artifacts. See the [AWS
+documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-marketplace-develop.html)
+to learn more about SageMaker models.
 
 ### Single Container Model
 
@@ -99,11 +93,10 @@ const model = new sagemaker.Model(this, 'PrimaryContainerModel', {
 ### Inference Pipeline Model
 
 An inference pipeline is an Amazon SageMaker model that is composed of a linear sequence of multiple
-containers that process requests for inferences on data. You use an inference pipeline to
-define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own
-custom algorithms packaged in Docker containers. You can use an inference pipeline to combine
-preprocessing, predictions, and post-processing data science tasks. Inference pipelines are fully
-managed. To define an inference pipeline, you can provide additional containers for your model:
+containers that process requests for inferences on data. See the [AWS
+documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html) to learn
+more about SageMaker inference pipelines. To define an inference pipeline, you can provide
+additional containers for your model:
 
 ```typescript
 import * as sagemaker from '@aws-cdk/aws-sagemaker';
@@ -155,7 +148,8 @@ const image = sagemaker.ContainerImage.fromEcrRepository(repository, 'tag');
 
 ### Model Artifacts
 
-If you choose to decouple your model artifacts from your inference code, the artifacts can be
+If you choose to decouple your model artifacts from your inference code (as is natural given
+different rates of change between inference code and model artifacts), the artifacts can be
 specified via the `modelData` property which accepts a class that extends the `ModelData` abstract
 base class. The default is to have no model artifacts associated with a model.
 

From 9723b5fee391616a8b016b7e4b724272a603255f Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Mon, 3 Oct 2022 17:32:58 -0700
Subject: [PATCH 25/30] Link to SageMaker ENI CloudFormation issue

---
 text/0431-sagemaker-l2-endpoint.md | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 4b71a9e67..69ed7617f 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -1324,14 +1324,15 @@ proposed APIs are appropriately extensible in order to support these use-cases.
    Interfaces (ENIs) associated with a SageMaker model's VPC are not always cleaned up in a timely
    manner after downstream compute resources are deleted. As a result, attempts to delete a
    SageMaker endpoint along with its networking resources (e.g., subnets, security groups) from a
-   CloudFormation stack may cause the stack operation to fail as the ENIs are still in use. From a
-   CDK integration test perspective, specifying `--no-clean` will allow the generation of a snapshot
-   regardless of whether stack deletion will succeed or fail but may hinder snapshot re-generation
-   by subsequent CDK contributors. For this reason, it may be helpful to exclude VPC specification
-   from the endpoint integration test at this time.
+   CloudFormation stack [may cause the stack operation to fail as the ENIs are still in
+   use][sagemaker-eni-issue]. From a CDK integration test perspective, specifying `--no-clean` will
+   allow the generation of a snapshot regardless of whether stack deletion will succeed or fail but
+   may hinder snapshot re-generation by subsequent CDK contributors. For this reason, it may be
+   helpful to exclude VPC specification from the endpoint integration test at this time.
 
 [lambda-eni-issue]: https://github.com/aws/aws-cdk/issues/12827
 [eks-eni-issue]: https://github.com/aws/aws-cdk/issues/9970
+[sagemaker-eni-issue]: https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/1327
 
 #### Cross-module API Convergence
 

From b249090d16e8290ff3970b878b230840e4835218 Mon Sep 17 00:00:00 2001
From: Kaizen Conroy <36202692+kaizencc@users.noreply.github.com>
Date: Thu, 6 Oct 2022 11:49:33 -0400
Subject: [PATCH 26/30] signing off as api bar raiser

---
 text/0431-sagemaker-l2-endpoint.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 69ed7617f..a1aa49ea3 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -279,7 +279,7 @@ signed-off by the API bar raiser (the `api-approved` label was applied to the
 RFC pull request):
 
 ```
-[ ] Signed-off by API Bar Raiser @xxxxx
+[x] Signed-off by API Bar Raiser @conroyka
 ```
 
 ## Public FAQ

From 303b940bde34c24b219bfe153d8b5b5ce3d3e863 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 6 Oct 2022 14:39:31 -0700
Subject: [PATCH 27/30] Adjust EndpointProps to take IEndpointConfig

---
 text/0431-sagemaker-l2-endpoint.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index a1aa49ea3..d399c72d0 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -933,10 +933,8 @@ relative to the other configured variants.
 
     /**
      * The endpoint configuration to use for this endpoint.
-     *
-     * [disable-awslint:ref-via-interface]
      */
-    readonly endpointConfig: EndpointConfig;
+    readonly endpointConfig: IEndpointConfig;
   }
   ```
 

From 558484668a430414d0bfd9b4b9463e112651d95a Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 6 Oct 2022 14:41:00 -0700
Subject: [PATCH 28/30] Fix my own entry on original authors

---
 text/0431-sagemaker-l2-endpoint.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index d399c72d0..7302e2a51 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -1,6 +1,6 @@
 SageMaker Model Hosting L2 Constructs
 
-* **Original Author(s):** @pvanlund, @mattmcclean, @l2yao, @jetterdj, @foxpro24, @rangoju
+* **Original Author(s):** @petermeansrock, @mattmcclean, @l2yao, @jetterdj, @foxpro24, @rangoju
 * **Tracking Issue**: #431
 * **API Bar Raiser**: *TBD*
 

From 4342d7764958f136f0761a15596bf425bbb3e257 Mon Sep 17 00:00:00 2001
From: Peter VanLund <792688+petermeansrock@users.noreply.github.com>
Date: Thu, 6 Oct 2022 14:42:32 -0700
Subject: [PATCH 29/30] Add API bar raiser login to top to match sign-off

---
 text/0431-sagemaker-l2-endpoint.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 7302e2a51..89f70ae29 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -2,7 +2,7 @@ SageMaker Model Hosting L2 Constructs
 
 * **Original Author(s):** @petermeansrock, @mattmcclean, @l2yao, @jetterdj, @foxpro24, @rangoju
 * **Tracking Issue**: #431
-* **API Bar Raiser**: *TBD*
+* **API Bar Raiser**: @conroyka
 
 This feature supports the creation of Amazon SageMaker real-time inference hosted endpoints using a
 new set of L2 constructs for the `Endpoint`, `EndpointConfig`, and `Model` CloudFormation resources.

From 8e4a48c15a84f7498dcac8212be391b17eb366d9 Mon Sep 17 00:00:00 2001
From: Kaizen Conroy <36202692+kaizencc@users.noreply.github.com>
Date: Thu, 6 Oct 2022 17:52:54 -0400
Subject: [PATCH 30/30] Update 0431-sagemaker-l2-endpoint.md

---
 text/0431-sagemaker-l2-endpoint.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/text/0431-sagemaker-l2-endpoint.md b/text/0431-sagemaker-l2-endpoint.md
index 89f70ae29..a429c0d99 100644
--- a/text/0431-sagemaker-l2-endpoint.md
+++ b/text/0431-sagemaker-l2-endpoint.md
@@ -2,7 +2,7 @@ SageMaker Model Hosting L2 Constructs
 
 * **Original Author(s):** @petermeansrock, @mattmcclean, @l2yao, @jetterdj, @foxpro24, @rangoju
 * **Tracking Issue**: #431
-* **API Bar Raiser**: @conroyka
+* **API Bar Raiser**: @kaizencc
 
 This feature supports the creation of Amazon SageMaker real-time inference hosted endpoints using a
 new set of L2 constructs for the `Endpoint`, `EndpointConfig`, and `Model` CloudFormation resources.
@@ -279,7 +279,7 @@ signed-off by the API bar raiser (the `api-approved` label was applied to the
 RFC pull request):
 
 ```
-[x] Signed-off by API Bar Raiser @conroyka
+[x] Signed-off by API Bar Raiser @kaizencc
 ```
 
 ## Public FAQ