-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[autoscaler] Consolidate CloudWatch agent/dashboard/alarm support; Add unit tests for AWS autoscaler CloudWatch integration #22070
[autoscaler] Consolidate CloudWatch agent/dashboard/alarm support; Add unit tests for AWS autoscaler CloudWatch integration #22070
Conversation
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
As a high-level comment, there's a number of typing mismatches in the existing |
7d309d6
to
3dfe6bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still a lot of places here where we seem to be confusing types and, in particular, mixing CloudwatchConfigType enum types with their string values, which are introducing breaking changes.
I've added some suggestions here under the premise that it's probably easiest to just convert from enum-to-string as early as possible and then just pass strings around everywhere for backwards compatibility, but there's probably more - can we take another pass to try to clean this up, and make sure integration tests pass for all supported config types?
node_id: str, | ||
retry_failed: bool = True, | ||
) -> bool: | ||
) -> Dict[str, any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> Dict[str, any]: | |
) -> Dict[str, Any]: |
@@ -281,19 +303,19 @@ def _replace_config_variables( | |||
|
|||
def _replace_all_config_variables( | |||
self, | |||
collection: Union[dict, list], | |||
collection: Union[Dict[str, any], str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collection: Union[Dict[str, any], str], | |
collection: Union[Dict[str, Any], List[Dict[str, Any]]], |
node_id: str, | ||
cluster_name: str, | ||
region: str, | ||
) -> Tuple[(Union[dict, list], int)]: | ||
) -> Union[str, Dict[str, any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) -> Union[str, Dict[str, any]]: | |
) -> Union[str, Dict[str, Any]]: |
The input collection must be either a dict or list. | ||
Returns a tuple consisting of the output collection and the number of | ||
modified strings in the collection (which is not necessarily equal to | ||
the number of variables replaced). | ||
""" | ||
|
||
for key in collection: | ||
if type(collection) is dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a final else
statement that catches unexpected collection types and raises a ValueError like:
raise ValueError(f"Can't replace config variables in unsupported collection type: {type(collection)}"
As it is, you'll wind up with a NameError at line 326 if the collection isn't a dict or list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Patrick, added ValueError check.
@@ -311,7 +333,7 @@ def _replace_all_config_variables( | |||
) | |||
return collection | |||
|
|||
def _load_config_file(self, config_type: str) -> Dict[str, Any]: | |||
def _load_config_file(self, config_type: CloudwatchConfigType) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _load_config_file(self, config_type: CloudwatchConfigType) -> Dict[str, Any]: | |
def _load_config_file(self, config_type: str) -> Union[Dict[str, Any], List[Dict[str, Any]]]: |
CloudwatchConfigType.AGENT: self._replace_cwa_config_variables, | ||
CloudwatchConfigType.DASHBOARD: self._replace_dashboard_config_variables, | ||
CloudwatchConfigType.ALARM: self._load_config_file, | ||
} | ||
self.CLOUDWATCH_CONFIG_TYPE_TO_UPDATE_FUNC_HEAD_NODE = { | ||
CloudwatchConfigType.AGENT: self._restart_cloudwatch_agent, | ||
CloudwatchConfigType.DASHBOARD: self._put_cloudwatch_dashboard, | ||
CloudwatchConfigType.ALARM: self._put_cloudwatch_alarm, | ||
} | ||
self.CLOUDWATCH_CONFIG_TYPE_TO_UPDATE_FUNC_WORKER_NODE = { | ||
CloudwatchConfigType.AGENT: self._restart_cloudwatch_agent, | ||
CloudwatchConfigType.ALARM: self._put_cloudwatch_alarm, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CloudwatchConfigType.AGENT: self._replace_cwa_config_variables, | |
CloudwatchConfigType.DASHBOARD: self._replace_dashboard_config_variables, | |
CloudwatchConfigType.ALARM: self._load_config_file, | |
} | |
self.CLOUDWATCH_CONFIG_TYPE_TO_UPDATE_FUNC_HEAD_NODE = { | |
CloudwatchConfigType.AGENT: self._restart_cloudwatch_agent, | |
CloudwatchConfigType.DASHBOARD: self._put_cloudwatch_dashboard, | |
CloudwatchConfigType.ALARM: self._put_cloudwatch_alarm, | |
} | |
self.CLOUDWATCH_CONFIG_TYPE_TO_UPDATE_FUNC_WORKER_NODE = { | |
CloudwatchConfigType.AGENT: self._restart_cloudwatch_agent, | |
CloudwatchConfigType.ALARM: self._put_cloudwatch_alarm, | |
CloudwatchConfigType.AGENT.value: self._replace_cwa_config_variables, | |
CloudwatchConfigType.DASHBOARD.value: self._replace_dashboard_config_variables, | |
CloudwatchConfigType.ALARM.value: self._load_config_file, | |
} | |
self.CLOUDWATCH_CONFIG_TYPE_TO_UPDATE_FUNC_HEAD_NODE = { | |
CloudwatchConfigType.AGENT.value: self._restart_cloudwatch_agent, | |
CloudwatchConfigType.DASHBOARD.value: self._put_cloudwatch_dashboard, | |
CloudwatchConfigType.ALARM.value: self._put_cloudwatch_alarm, | |
} | |
self.CLOUDWATCH_CONFIG_TYPE_TO_UPDATE_FUNC_WORKER_NODE = { | |
CloudwatchConfigType.AGENT.value: self._restart_cloudwatch_agent, | |
CloudwatchConfigType.ALARM.value: self._put_cloudwatch_alarm, |
self._update_cloudwatch_config(is_head_node, "alarm") | ||
for config_type in CloudwatchConfigType: | ||
if CloudwatchHelper.cloudwatch_config_exists( | ||
self.provider_config, config_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.provider_config, config_type | |
self.provider_config, config_type.value |
@@ -118,7 +134,9 @@ def _put_cloudwatch_dashboard(self) -> Dict[str, Any]: | |||
dashboard_name_cluster = dashboard_config.get("name", self.cluster_name) | |||
dashboard_name = self.cluster_name + "-" + dashboard_name_cluster | |||
|
|||
widgets = self._replace_dashboard_config_variables() | |||
widgets = self._replace_dashboard_config_variables( | |||
CloudwatchConfigType.DASHBOARD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CloudwatchConfigType.DASHBOARD | |
CloudwatchConfigType.DASHBOARD.value |
@@ -190,7 +208,9 @@ def _ssm_command_waiter( | |||
command_id = response["Command"]["CommandId"] | |||
|
|||
cloudwatch_config = self.provider_config["cloudwatch"] | |||
agent_retryer_config = cloudwatch_config.get("agent").get("retryer", {}) | |||
agent_retryer_config = cloudwatch_config.get(CloudwatchConfigType.AGENT).get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agent_retryer_config = cloudwatch_config.get(CloudwatchConfigType.AGENT).get( | |
agent_retryer_config = cloudwatch_config.get(CloudwatchConfigType.AGENT.value).get( |
if CloudwatchHelper.cloudwatch_config_exists( | ||
self.provider_config, config_type | ||
): | ||
self._update_cloudwatch_config(config_type, is_head_node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self._update_cloudwatch_config(config_type, is_head_node) | |
self._update_cloudwatch_config(config_type.value, is_head_node) |
3d93915
to
d91ce72
Compare
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
@Zyiqin-Miranda do we still need this pr? |
@wuisawesome, Yes, sorry for the late response. This PR mainly adds tests for CloudWatch code and add some fixes to typing hints. I think it's helpful to ensure correct behavior of CloudWatch part. |
@pdames @wuisawesome could you help review and merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, @pdames do you want to take a final look before we merge?
@pdames gentle bump |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
d91ce72
to
7448168
Compare
7a15fbe
to
8f39e03
Compare
…d unit tests for AWS autoscaler CloudWatch integration
8f39e03
to
9fd6585
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for all of the tests and the code cleanup, @Zyiqin-Miranda!
@wuisawesome @DmitriGekhtman This should be ready to merge! |
Test failures look unrelated. I'm going to push the button. |
…d unit tests for AWS autoscaler CloudWatch integration (ray-project#22070) This PR mainly adds two improvements: We have introduced three CloudWatch Config support in previous PRs: Agent, Dashboard and Alarm. In this PR, we generalize the logic of all three config types by using enum CloudwatchConfigType. Adds unit tests to ensure the correctness of Ray autoscaler CloudWatch integration behavior. Signed-off-by: Huaiwei Sun <[email protected]>
…d unit tests for AWS autoscaler CloudWatch integration (ray-project#22070) This PR mainly adds two improvements: We have introduced three CloudWatch Config support in previous PRs: Agent, Dashboard and Alarm. In this PR, we generalize the logic of all three config types by using enum CloudwatchConfigType. Adds unit tests to ensure the correctness of Ray autoscaler CloudWatch integration behavior. Signed-off-by: Stefan van der Kleij <[email protected]>
Why are these changes needed?
This PR mainly adds two improvements:
Related issue number
"Closes #9644 #8967"
Checks
scripts/format.sh
to lint the changes in this PR.