-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(17.32.0) DEVOPS-9370: use old capacity value when scaling down #41
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 after cleaning up the commits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanup commits, otherwise looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although less verbose, these code changes make it harder to understand what is actually happening.
Is all this refactoring necessary?
Also isn't the issue around trying to scale down, below the min capacity of the ASG? can't we just set the desired count to the max(new_count,group_min_count)
@mayn It's not refactoring really. It's changing the value it looks at when it scales down. That is it does not look at the ASG capacity at that moment (which could have been changed since deployment started due to cpu utilization). It looks at the old ASG capacity that we retrieved when deployment started. It won't dip below min capacity as the old desired capacity is either equal or above the min capacity when deployment starts. We just revert to what the ASG desired capacity was during scaling down. This new logic gurantee it won't dip below min capacity. But the old logic could as we have seen. |
@@ -52,6 +52,7 @@ def __init__(self, | |||
self.health_wait = health_wait | |||
self.only_new_wait = only_new_wait | |||
self.existing_instance_ids = [] | |||
self.old_desired_capacity = 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this default of 2
coming from vs None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default can be anything because that field will be assigned to at the beginning of deployment. However a test case is going to fail if set it to None.
if desired_state == 'increase': | ||
new_count = self.double_autoscale_instance_count(cur_count) | ||
self.old_desired_capacity = int(self.get_group_info(group_name)[0].desired_capacity) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should moved back outside of the try block.
if you want to cache the value do an if None:
set the value, the reason being is that if this method is called todecrease
without being called to increase
first it will currently set itself to 2
, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. As mentioned above, 2
was set there just to satisfy a test case. It could be set to 1000 as long as a number greater than 2. It won't matter because this line right here will set it to the original desired capacity of the ASG. It is 2 for dnbi, perhaps 3 for owl? or 5 for whatever. It is the ASG capacity that we later want to revert to.
logging.info("Current desired count was changed from {0} to {1}".format(self.new_desired_capacity, self.old_desired_capacity)) | ||
self.new_desired_capacity = self.old_desired_capacity | ||
else: | ||
raise Exception("Please make sure the desired_state is set to either increase or decrease") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put back the value that was passed in the error message
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should also be a ValueError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this is to satisfy a test case. We can change the test but it is really not related to the issue we are attacking here.
@@ -52,6 +52,7 @@ def __init__(self, | |||
self.health_wait = health_wait | |||
self.only_new_wait = only_new_wait | |||
self.existing_instance_ids = [] | |||
self.old_desired_capacity = 2 | |||
self.new_desired_capacity = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a dead gobal variable as you've refactored to only be used within method scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not dead. new_desired_capacity
is used by code (line 202). We need this field to hold the value the capacity is temporarily increased to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it's only used in the one function, calculate_autoscale_desired_instance_count, it doesn't need to be declared in the init. we can just declare it in the function w/o self.
@@ -52,6 +52,7 @@ def __init__(self, | |||
self.health_wait = health_wait | |||
self.only_new_wait = only_new_wait | |||
self.existing_instance_ids = [] | |||
self.old_desired_capacity = 2 | |||
self.new_desired_capacity = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it's only used in the one function, calculate_autoscale_desired_instance_count, it doesn't need to be declared in the init. we can just declare it in the function w/o self.
@taoistmath We need to store that state when it's initially calculated and use that value later, rather than calculate on the fly when it's needed. That's the whole point. Those 2 fields (old and new) are the states we need to store when deployment begins and we need to reference these 2 states later. We do not want to calculate them when the same function is invoked again later, because the ASG capacity could have changed by then. |
@nz285 sorry for not understanding, when I look at the code base, the references to new_desired_capacity are all being removed except for in that one function. Can you please point me to where that state is being stored? You reference line 202 in your comment to mayn, but you're removing that line in your PR, so I'm not sure where I should be looking. |
@taoistmath That variable is directly used only on line 202, yes. But if you follow up the call chain of the function it is in all the way to |
@@ -123,24 +124,19 @@ def get_lb(self): | |||
def calculate_autoscale_desired_instance_count(self, group_name, desired_state): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be done as such and avoid the class variable:
def calculate_autoscale_desired_instance_count(self, group_name, desired_state):
''' Search via specific autoscale group name to return modified desired instance count '''
try:
new_desired_capacity = self.old_desired_capacity * 2
if desired_state == 'increase':
self.old_desired_capacity = int(self.get_group_info(group_name)[0].desired_capacity)
logging.info("Current desired count was changed from {0} to {1}".format(self.old_desired_capacity, new_desired_capacity))
return new_desired_capacity
elif desired_state == 'decrease':
logging.info("Current desired count was changed from {0} to {1}".format(new_desired_capacity, self.old_desired_capacity))
return self.old_desired_capacity
else:
raise Exception("Please make sure the desired_state is set to either increase or decrease")
return None #not sure this is required
except Exception as e:
logging.error(e)
exit(self.exit_error_code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for joining the party late, but CloudWatch Alarms are supposed to be disabled before and re-enabled after deployment. This issue should never occur since the desired capacity should not change.
License2Deploy/License2Deploy/rolling_deploy.py
Lines 365 to 372 in adecf1d
self.disable_project_cloudwatch_alarms() | |
self.new_desired_capacity = self.calculate_autoscale_desired_instance_count(group_name, 'increase') | |
self.set_autoscale_instance_desired_count(self.new_desired_capacity, group_name) | |
self.launch_new_instances(group_name) | |
self.set_autoscale_instance_desired_count(self.calculate_autoscale_desired_instance_count(group_name, 'decrease'), group_name) | |
self.confirm_lb_has_only_new_instances() | |
self.tag_ami(self.ami_id, self.env) | |
self.enable_project_cloudwatch_alarms() |
I'm guessing there is some other bug during disabling of cloudwatch alarms.
Ticket here -
https://dunandb.jira.com/browse/DEVOPS-9370
I left a comment in the ticket detailing the reason for this change.
Tested with deploying
dnbi-cache_qa
from my local and it works.Also tested with
cos-service_qa
-