Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release test] [Cluster launcher] Add release test for aws example-full.yaml #34487

Merged

Conversation

architkulkarni
Copy link
Contributor

@architkulkarni architkulkarni commented Apr 17, 2023

Why are these changes needed?

Adds a release test for example-full.yaml on AWS.

Starts the cluster with ray up, runs a simple Ray driver script, and calls ray down.

Also fixes a bug in this YAML file where we were using a string instead of an int for a VolumeSize.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@architkulkarni
Copy link
Contributor Author

@architkulkarni architkulkarni marked this pull request as ready for review April 17, 2023 23:46
@@ -79,7 +79,7 @@ available_node_types:
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 140GB
VolumeSize: 140
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This used to fail with

Traceback (most recent call last):
  File "/home/ray/anaconda3/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 2448, in main
    return cli()
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/cli_logger.py", line 856, in wrapper
    return f(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/scripts/scripts.py", line 1272, in up
    use_login_shells=use_login_shells,
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/commands.py", line 289, in create_or_update_cluster
    no_monitor_on_head,
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/commands.py", line 726, in get_or_create_head_node
    provider.create_node(head_node_config, head_node_tags, 1)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/aws/node_provider.py", line 320, in create_node
    created_nodes_dict = self._create_node(node_config, tags, count)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ray/autoscaler/_private/aws/node_provider.py", line 424, in _create_node
    created = self.ec2_fail_fast.create_instances(**conf)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/boto3/resources/factory.py", line 580, in do_action
    response = action(self, *args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/boto3/resources/action.py", line 88, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/botocore/client.py", line 924, in _make_api_call
    headers=additional_headers,
  File "/home/ray/anaconda3/lib/python3.7/site-packages/botocore/client.py", line 991, in _convert_to_request_dict
    api_params, operation_model
  File "/home/ray/anaconda3/lib/python3.7/site-packages/botocore/validate.py", line 381, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter BlockDeviceMappings[0].Ebs.VolumeSize, value: 140GB, type: <class 'str'>, valid types: <class 'int'>

@wuisawesome
Copy link
Contributor

Can you post an example of a green e2e run when this is ready?

@architkulkarni
Copy link
Contributor Author

Can you post an example of a green e2e run when this is ready?

#34487 (comment)

@architkulkarni
Copy link
Contributor Author

Lint passed. Other failed tests unrelated, this PR just adds a release test.

@architkulkarni architkulkarni added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Apr 18, 2023
@architkulkarni architkulkarni merged commit 39d7de6 into ray-project:master Apr 18, 2023
@architkulkarni architkulkarni deleted the test-aws-example-full branch April 18, 2023 16:58
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
…ull.yaml` (ray-project#34487)

Adds a release test for example-full.yaml on AWS.

Starts the cluster with ray up, runs a simple Ray driver script, and calls ray down.

Also fixes a bug in this YAML file where we were using a string instead of an int for a VolumeSize.

Signed-off-by: elliottower <[email protected]>
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
…ull.yaml` (ray-project#34487)

Adds a release test for example-full.yaml on AWS.

Starts the cluster with ray up, runs a simple Ray driver script, and calls ray down.

Also fixes a bug in this YAML file where we were using a string instead of an int for a VolumeSize.

Signed-off-by: Jack He <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants