[autoscaler] Add support for EC2 launch templates. #17236

pdames · 2021-07-21T08:38:08Z

These changes add support for EC2 Launch Templates as part of a user's AWS Autoscaler node config. Any parameters specified in node_config override the same parameters in the launch template, in compliance with the behavior of EC2's create_instances API. These changes were previously proposed in: #9336.

Related issue number

Closes #9334
Resolves milestone [8] of #8420.

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

pdames · 2021-07-21T08:40:36Z

Opening this PR with required changes for EC2 launch template support and an example autoscaler config for review. Will update with unit tests.

ijrsvt

Looks fairly good (and quite a reasonably sized PR :) )!

ijrsvt · 2021-07-21T22:20:55Z

python/ray/autoscaler/_private/aws/config.py

+def _configure_from_launch_template(config):
+    for node_type in config["available_node_types"].values():
+        config = _configure_node_type_from_launch_template(config, node_type)
+    return config
+
+
+def _configure_node_type_from_launch_template(config, node_type):
+    node_cfg = node_type["node_config"]
+    if "LaunchTemplate" not in node_cfg:


Could you add docstrings & type hints for these functions?

Added docstrings and type hints for both this PR and the related changes in #14080. The new unit test also exercises the E2E happy-path for these changes in config.py and node_provider.py.

ijrsvt · 2021-07-22T05:22:19Z

python/ray/autoscaler/_private/aws/config.py

+    template_version = kwargs.pop("Version", "$Default")
+    kwargs["Versions"] = [template_version] if template_version else []


Should this be setdefault? (as in kwargs.setdefault("Versions", ...)). I'm wondering if we should avoid adding a field Version (no s) in the LaunchTemplate dict.

Good question! My thinking here is that we should keep the launch template autoscaler config model and behavior consistent with the boto3 ec2.create_instances API (https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/ec2.html#EC2.ServiceResource.create_instances) that this LaunchTemplate is ultimately forwarded to inside of node_provider.py:

The ec2.create_instances input launch template model is:

LaunchTemplate={ 'LaunchTemplateId': 'string', 'LaunchTemplateName': 'string', 'Version': 'string' }

Since this API only takes a singular Version argument, and since we in turn require only a single input version to describe for correct E2E behavior, I wanted to keep our internal dependency on the Versions argument used by ec2.describe_launch_template_versions hidden from the end-user writing the autoscaler config. This also gives clients one less opportunity to shoot themselves in the foot by trying to specify more than 1 launch template version in their config.

However, one usability enhancement I caught on this line yesterday is that kwargs.pop("Version", "$Default") should be changed to str(kwargs.pop("Version", "$Default")) so that users can simply specify an integer version number like Version: 2 in their autoscaler config YAML (as-written, they would be forced to specify integer strings like Version: "2"). 🙂

…bug fixes.

…dering.

ijrsvt · 2021-07-28T18:19:39Z

python/ray/autoscaler/_private/aws/config.py

+def _configure_from_launch_template(config: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Merges any launch template data referenced by the node config of all
+    available node type's into their parent node config. Any parameters
+    specified in node config override the same parameters in the launch
+    template, in compliance with the behavior of the ec2.create_instances
+    API. The config to bootstrap is modified in place.
+
+    Args:
+        config (Dict[str, Any]): config to bootstrap
+    Returns:
+        config (Dict[str, Any]): The input config with all launch template
+        data merged into the node config of all available node types. If no
+        launch template data is found, then the config is returned
+        unchanged.
+    Raises:
+        ValueError: If no launch template is found for any launch
+        template [name|id] and version, or more than one launch template is
+        found.
+    """
+
+    # iterate over sorted node types to support deterministic unit test stubs
+    for _, node_type in sorted(config["available_node_types"].items()):
+        config = _configure_node_type_from_launch_template(config, node_type)
+    return config


Can we have this not modify the config in place?

Done.

Note that this also required patching a couple areas that depend on bootstrap_aws to invalidate their assumption of in-place config modification. I've also introduced a copy.deepcopy(config) line at the top of bootstrap_aws to both prevent in-place modification of the input config to bootstrap by any existing code path, and to make this behavior more immediately clear to readers.

Cleaning up in-place modification of each individual section of the copied config (e.g. provider config, node type config, node config, etc.) seems like a good thing to target across subsequent PRs.

ijrsvt · 2021-07-28T18:21:28Z

python/ray/autoscaler/_private/aws/config.py

+def _configure_node_type_from_launch_template(
+        config: Dict[str, Any], node_type: Dict[str, Any]) -> Dict[str, Any]:
+    """
+    Merges any launch template data referenced by the given node type's
+    node config into the parent node config. Any parameters specified in
+    node config override the same parameters in the launch template. The
+    config to bootstrap is modified in place.
+
+    Note that this merge is simply a bidirectional dictionary update, from
+    the node config to the launch template data, and from the launch
+    template data to the node config. Thus, the final result captures the
+    relative complement of launch template data with respect to node config,
+    and allows all subsequent config bootstrapping code paths to act as
+    if the complement was explicitly specified in the user's node config. A
+    deep merge of nested elements like tag specifications isn't required
+    here, since the AWSNodeProvider's ec2.create_instances call will do this
+    for us after it fetches the referenced launch template data.
+
+    Args:
+        config (Dict[str, Any]): config to bootstrap
+        node_type (Dict[str, Any]): node type config to bootstrap
+    Returns:
+        config (Dict[str, Any]): The input config with all launch template
+        data merged into the node config of the input node type. If no
+        launch template data is found, then the config is returned
+        unchanged.
+    Raises:
+        ValueError: If no launch template is found for the given launch
+        template [name|id] and version, or more than one launch template is
+        found.
+    """
+    node_cfg = node_type["node_config"]
+    if "LaunchTemplate" not in node_cfg:
+        return config


Can we make the modification of node_type not in-place and return the modified node_type instead?

ijrsvt · 2021-07-28T18:23:46Z

python/ray/autoscaler/_private/aws/node_provider.py

+        """
+        Merges user-provided node config tag specifications into a base
+        list of node provider tag specifications. The base list of
+        node provider tag specs is modified in-place.
+
+        This allows users to add tags and override values of existing
+        tags with their own, and only applies to the resource type
+        "instance". All other resource types are appended to the list of
+        tag specs.
+
+        Args:
+            tag_specs (List[Dict[str, Any]]): base node provider tag specs
+            user_tag_specs (List[Dict[str, Any]]): user's node config tag specs
+        """
+
+        for user_tag_spec in user_tag_specs:


Thanks for making this a separate function!
Modification in place seems fine here :)

ijrsvt · 2021-07-28T18:25:54Z

python/ray/tests/aws/utils/helpers.py

+    """
+    Applies default updates made by AWSNodeProvider to node_cfg during node
+    creation.
+


Can you mention that this is only used in tests?

I think this is the only outstanding thing to change here!

ijrsvt

LGTM!

ijrsvt · 2021-07-30T15:05:54Z

Windows Failure is not related (object_spilling) and this PR is strictly autoscaler!

[autoscaler] Add support for EC2 launch templates.

8913138

wuisawesome assigned wuisawesome, DmitriGekhtman, ijrsvt and xcharleslin Jul 21, 2021

ijrsvt reviewed Jul 22, 2021

View reviewed changes

[autoscaler] EC2 launch template unit tests, typing, docstrings, and …

42e54e6

…bug fixes.

pdames requested a review from ijrsvt July 24, 2021 08:32

[autoscaler] Refactor tests/types/docstrings and deterministic tag or…

3620090

…dering.

ijrsvt reviewed Jul 28, 2021

View reviewed changes

pdames force-pushed the launch-templates branch from 92a41da to fcf6e7f Compare July 29, 2021 01:30

[autoscaler] Stop modifying AWS config to bootstrap in-place.

9785cdd

pdames force-pushed the launch-templates branch from fcf6e7f to 9785cdd Compare July 29, 2021 01:39

ijrsvt approved these changes Jul 29, 2021

View reviewed changes

ijrsvt merged commit 131710f into ray-project:master Jul 30, 2021

stephanie-wang pushed a commit to stephanie-wang/ray that referenced this pull request Jul 31, 2021

[autoscaler] Add support for EC2 launch templates. (ray-project#17236)

56b234e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[autoscaler] Add support for EC2 launch templates. #17236

[autoscaler] Add support for EC2 launch templates. #17236

pdames commented Jul 21, 2021 •

edited

Loading

pdames commented Jul 21, 2021

ijrsvt left a comment

ijrsvt Jul 21, 2021

pdames Jul 24, 2021

ijrsvt Jul 22, 2021

pdames Jul 22, 2021

ijrsvt Jul 28, 2021

pdames Jul 29, 2021

ijrsvt Jul 28, 2021

pdames Jul 29, 2021

ijrsvt Jul 28, 2021

ijrsvt Jul 28, 2021

ijrsvt Jul 29, 2021

pdames Jul 29, 2021

ijrsvt left a comment

ijrsvt commented Jul 30, 2021

		template_version = kwargs.pop("Version", "$Default")
		kwargs["Versions"] = [template_version] if template_version else []

[autoscaler] Add support for EC2 launch templates. #17236

[autoscaler] Add support for EC2 launch templates. #17236

Conversation

pdames commented Jul 21, 2021 • edited Loading

Related issue number

Checks

pdames commented Jul 21, 2021

ijrsvt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ijrsvt left a comment

Choose a reason for hiding this comment

ijrsvt commented Jul 30, 2021

pdames commented Jul 21, 2021 •

edited

Loading