-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding an option for --disable-nodegroup-eviction
when deleting a cluster
#4659
Adding an option for --disable-nodegroup-eviction
when deleting a cluster
#4659
Conversation
pkg/actions/cluster/delete.go
Outdated
@@ -159,7 +163,7 @@ func checkForUndeletedStacks(stackManager manager.StackManager) error { | |||
return nil | |||
} | |||
|
|||
func drainAllNodegroups(cfg *api.ClusterConfig, ctl *eks.ClusterProvider, stackManager manager.StackManager, clientSet kubernetes.Interface, allStacks []manager.NodeGroupStack) error { | |||
func drainAllNodeGroups(cfg *api.ClusterConfig, ctl *eks.ClusterProvider, clientSet kubernetes.Interface, allStacks []manager.NodeGroupStack, disableEviction *bool, nodeGroupDrainer NodeGroupDrainer, vpcCniDeleter VpcCniDeleter) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to make disableEviction
a pointer type.
func drainAllNodeGroups(cfg *api.ClusterConfig, ctl *eks.ClusterProvider, clientSet kubernetes.Interface, allStacks []manager.NodeGroupStack, disableEviction *bool, nodeGroupDrainer NodeGroupDrainer, vpcCniDeleter VpcCniDeleter) error { | |
func drainAllNodeGroups(cfg *api.ClusterConfig, ctl *eks.ClusterProvider, clientSet kubernetes.Interface, allStacks []manager.NodeGroupStack, disableEviction bool, nodeGroupDrainer NodeGroupDrainer, vpcCniDeleter VpcCniDeleter) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
pkg/ctl/delete/cluster_test.go
Outdated
}) | ||
}) | ||
_, err := cmd.execute() | ||
Expect(err).To(Not(HaveOccurred())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
Expect(err).To(Not(HaveOccurred())) | |
Expect(err).NotTo(HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
4ed383e
to
b340701
Compare
Thanks for opening the PR @AmitBenAmi, descriptions makes sense 👌 will take a look. +10000 for the lovely tests 😄 |
pkg/actions/cluster/delete.go
Outdated
type NodeGroupDrainer interface { | ||
Drain(nodeGroups []eks.KubeNodeGroup, plan bool, maxGracePeriod, nodeDrainWaitPeriod time.Duration, undo, disableEviction bool) error | ||
} | ||
type VpcCniDeleter func(clusterName string, ctl *eks.ClusterProvider, clientSet kubernetes.Interface) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to export this.
type VpcCniDeleter func(clusterName string, ctl *eks.ClusterProvider, clientSet kubernetes.Interface) | |
type vpcCNIDeleter func(clusterName string, ctl *eks.ClusterProvider, clientSet kubernetes.Interface) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
pkg/actions/cluster/delete_test.go
Outdated
@@ -0,0 +1,127 @@ | |||
package cluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mhmm does this test file actually get run when you run ginkgo -r pkg/action/cluster/
? I would of thought it needs to be in cluster_test
suite to register. But I get that it needs to have access to the private function drainAllNodeGroups
.
I'd prefer if we made drainAllNodeGroups
public, and moved this file into cluster_test
package.
what do you think 😄 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was my initial problem, which caused me to move it to a different package.
I didn't want to make the drainAllNodeGroups
public so I moved it to be in the same package.
Do you think it is a good thing to make it publicly?
Also, when I run ginkgo -r ./pkg/action/cluster
it does run these tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it is a good thing to make it publicly?
I prefer it to testing private functions by including the test package in the code package. But some folks might disagree 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by including the test package in the code package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He means, that he prefers making the function public, and then testing it, rather than having the test code in the same package as the code it self, aka, the test code is in cluster
package rather than cluster_test
package which makes it a separate package. The benefit of having cluster_test
is that you test the public API of the package. If a private function can't be tested through that, it means that it would need a bit of refactoring maybe or it's not working encapsulating, etc.
This isn't a dogma of course. :) It's a suggestion. Try and test private functions through utilising the public function. If it's hard to test, maybe there are certain conditions which are difficult to achieve in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well summarised @Skarlso . Does that make sense @AmitBenAmi ? Its not a blocker on merging as its a personal preference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aclevername @Skarlso I changed it to fit this desire.
Basically, I also agree, and I wasn't happy with setting the test file inside the same package.
I synthetically exported the private function inside the export_test.go
file, so I could test it outside of the cluster
package, and now the tests file is in its own cluster_test
package
0247f14
to
63b5a8d
Compare
pkg/actions/cluster/owned.go
Outdated
nodeGroupManager := c.newNodeGroupManager(c.cfg, c.ctl, clientSet) | ||
if err := drainAllNodeGroups(c.cfg, c.ctl, clientSet, allStacks, disableNodegroupEviction, nodeGroupManager, attemptVpcCniDeletion); err != nil { | ||
if force { | ||
logger.Warning("error occurred during nodegroups draining") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.Warning("error occurred during nodegroups draining") | |
logger.Warning("error %q occurred during nodegroups draining, force=true so proceeding with deletion", err.Error()) |
probably find better wording 😆 but the improvement makes sense 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/actions/cluster/owned_test.go
Outdated
}) | ||
|
||
err := c.Delete(time.Microsecond, false, false, false) | ||
Expect(err).To(HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we improve this by changing to
Expect(err).To(HaveOccurred()) | |
Expect(err).To(MatchError("whatever error I expected")) |
or if we don't know the full error: Expect(err).To(MatchError(ContainSubstring("whatever sub-error I expected")))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same in some of the other test 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to match the mocked exception message
3d5356b
to
fdac5d6
Compare
pkg/actions/cluster/delete.go
Outdated
nodeGroupManager := nodegroup.New(cfg, ctl, clientSet) | ||
if err := nodeGroupManager.Drain(cmdutils.ToKubeNodeGroups(cfg), false, ctl.Provider.WaitTimeout(), 0, false, false); err != nil { | ||
|
||
if err := nodeGroupDrainer.Drain(cmdutils.ToKubeNodeGroups(cfg), false, ctl.Provider.WaitTimeout(), 0, false, disableEviction); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, but Drain has now quite a lot of parameters. Would be nice to have them in a struct rather, where we can clearly read which 0
and which false
, false
belongs to which parameter. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/actions/cluster/owned.go
Outdated
if force { | ||
logger.Warning("an error occurred during nodegroups draining, force=true so proceeding with deletion: %q", err.Error()) | ||
} else { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if force { | |
logger.Warning("an error occurred during nodegroups draining, force=true so proceeding with deletion: %q", err.Error()) | |
} else { | |
return err | |
} | |
if !force { | |
return err | |
} | |
logger.Warning("an error occurred during nodegroups draining, force=true so proceeding with deletion: %q", err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a preference. :) I rather skip else
s to ease reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/actions/cluster/unowned.go
Outdated
if force { | ||
logger.Warning("an error occurred during nodegroups draining, force=true so proceeding with deletion: %q", err.Error()) | ||
} else { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if force { | |
logger.Warning("an error occurred during nodegroups draining, force=true so proceeding with deletion: %q", err.Error()) | |
} else { | |
return err | |
} | |
if !force { | |
return err | |
} | |
logger.Warning("an error occurred during nodegroups draining, force=true so proceeding with deletion: %q", err.Error()) |
It's a bummer that this has to be repeated. :/ ¯\_(ツ)_/¯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, this is fantastic work. :) I have like two nits, and the fact that it has to be duplicated for owned and unowned. I wished that wouldn't be the case, but nothing you can do about that right now. :)
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
Signed-off-by: Amit Ben Ami <[email protected]>
df47c12
to
5e9d7a9
Compare
Hey @aclevername , |
…er-with-force-flag
@AmitBenAmi we'll merge this as soon as the build is green :) thanks for all the hard work 🎉 |
…ce-flag' of github.com:AmitBenAmi/eksctl into disabling-nodes-eviction-when-deleting-cluster-with-force-flag
Head branch was pushed to by a user without write access
@Himangini Thanks, I pushed another commit to fix the lint errors |
…er-with-force-flag
Finally. :D :D Well done! Nice job. Thank you for your contribution @AmitBenAmi. |
@Skarlso Thanks for the help and feedback, it was a pleasure 😄 |
Description
When deleting a cluster that has pods with PDBs, I can not delete the whole cluster, since the nodes won't evict all of the workloads, causing the node groups draining step to fail.
Today, there is a flag when draining node groups of:
--disable-eviction
, which ignores PDB errors while draining the node groups.I've added a flag:
--disable-nodegroup-eviction
, that will exist when deleting a cluster, and achieves the same behavior of thedrain
command with the--disable-eviction
flag (it literally calls the drain function with the same value).fixes #4416
Checklist
README.md
, or theuserdocs
directory)area/nodegroup
) and kind (e.g.kind/improvement
)BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯