Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug during cluster health check #117

Merged
merged 2 commits into from
Jan 13, 2022

Conversation

kkworden
Copy link
Contributor

@kkworden kkworden commented Jan 6, 2022

Uses existing code to check if the node should be excluded during the desired node count calculation

@crhuber crhuber merged commit 70306ef into hellofresh:master Jan 13, 2022
@js-timbirkett
Copy link
Contributor

@crhuber - any chance the build/release for this can be kicked. Seems the Github action failed and logs are no longer visible.

@js-timbirkett
Copy link
Contributor

I built a container from master and tried it out. This feature is just plain broken. When trying to rolling update with labels set to exclude a couple of nodes you get:

15:46:56  2022-02-09 15:46:56,187 INFO     All k8s nodes are healthy
15:46:56  2022-02-09 15:46:56,188 INFO     Cluster validation passed. Proceeding with node draining and termination...
15:46:56  2022-02-09 15:46:56,188 INFO     Proceeding with node draining and termination...
15:46:56  2022-02-09 15:46:56,189 INFO     Getting k8s nodes...
15:46:56  2022-02-09 15:46:56,378 INFO     Current k8s node count is 45
15:46:56  2022-02-09 15:46:56,378 INFO     Searching for k8s node name by instance id...
15:46:56  2022-02-09 15:46:56,378 INFO     InstanceId i-0b0abed453f717c04 is node ip-100-64-2-147.eu-west-1.compute.internal in kubernetes land
15:46:56  2022-02-09 15:46:56,379 INFO     Cordoning k8s node ip-100-64-2-147.eu-west-1.compute.internal...
15:46:56  2022-02-09 15:46:56,403 INFO     Node cordoned
15:46:56  2022-02-09 15:46:56,403 INFO     Searching for k8s node name by instance id...
15:46:56  2022-02-09 15:46:56,404 INFO     Could not find a k8s node name for that instance id. Exiting
15:46:56  2022-02-09 15:46:56,404 ERROR    Encountered an error when adding taint/cordoning node 
15:46:56  2022-02-09 15:46:56,404 ERROR    Could not find a k8s node name for that instance id. Exiting

As it appears to be trying to find the excluded node.... 🙈

@dgem
Copy link

dgem commented Jun 24, 2022

I'd agree with the above. The problem is that @ https://github.com/hellofresh/eks-rolling-update/blob/master/eksrollup/cli.py#L167 the list of nodes has been filtered... so when in runmode 4 (my experience) @ https://github.com/hellofresh/eks-rolling-update/blob/master/eksrollup/cli.py#L200 it tried to find the instance from the filtered node list :(

get_k8s_nodes() could be changed to return both the included and excluded nodes, and perhaps add a guard in before to check if the node is one that should be considered (isn't in the excluded nodes list)

Any thoughts ?

agustinbava pushed a commit to agustinbava/eks-rolling-update that referenced this pull request Jul 16, 2024
…_label_keys

Fix bug during cluster health check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants