-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework running Fence Agent command #106
Rework running Fence Agent command #106
Conversation
Skipping CI for Draft Pull Request. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: clobrano The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
58169ba
to
b35b878
Compare
/test |
@clobrano: The
Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test 4.15-openshift-e2e |
Working on a different approach for testing, so you might want to wait before reviewing this PR |
8e82aa7
to
d7e62f0
Compare
Move UpdateConditions function and related assets to the utils package to let the Executor update FAR status when Fence Agent execution completes. Signed-off-by: Carlo Lobrano <[email protected]>
- improved test independence - replaced custom functions with Gomega alternatives - other small improvements Signed-off-by: Carlo Lobrano <[email protected]>
d7e62f0
to
e988abf
Compare
/test 4.14-openshift-e2e |
Might want to add a [WIP] suffix on the PR until ready for review |
e988abf
to
9a91fcb
Compare
- run the command directly on the container without API requests - run the command asynchronously to free Reconciler loop - let the goroutine running the command also update FAR status accordingly to the result of the command. The status update will then trigger a new reconcile loop for the rest of the actions. - add new conditions to handle Fence Agent failures The goroutine running the fence agent is mapped to the FAR CR UID. see: https://issues.redhat.com/browse/ECOPROJECT-1755 Signed-off-by: Carlo Lobrano <[email protected]>
Signed-off-by: Carlo Lobrano <[email protected]>
Drop verification of the existence of the "Success" message in the controller's logs. This check is a strong dependency from the implementation, which means the test might fail in the future just because the log changes (even just in the AWS fence agent). Moreover the E2E checks already skip this control when the target node is the one where FAR resides, which means the other checks are sufficient for the test to pass. Signed-off-by: Carlo Lobrano <[email protected]>
9a91fcb
to
4ebea0e
Compare
/test 4.14-openshift-e2e |
Signed-off-by: Carlo Lobrano <[email protected]>
- add verifyRemediationTaintExists - add verifyRemediationConditions
/retest |
Signed-off-by: Carlo Lobrano <[email protected]>
Signed-off-by: Carlo Lobrano <[email protected]>
/retest |
pkg/cli/cliexecuter.go
Outdated
config: config, | ||
clientSet: clientSet, | ||
// NewExecuter builds an Executer with configurable runnerFunc for testing | ||
func NewFakeExecuter(client client.Client, fn runnerFunc) (*Executer, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a good rule of thumb to separate test code from production code, based on that I'd expect this method to be in a different file (and once moved it can probably made private ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, I'll move it to the test code
Signed-off-by: Carlo Lobrano <[email protected]>
For coherence, also fix the boolean return value if the status update was interrupted, even if in that case we are also returing an error which will makes the ExponentialBackoffWithContext function exit anyway. Signed-off-by: Carlo Lobrano <[email protected]>
runWithRetry function use a constant time back-off, not linear Signed-off-by: Carlo Lobrano <[email protected]>
/retest |
Signed-off-by: Carlo Lobrano <[email protected]>
/lgtm |
/hold |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall lgtm... few comments inline. Not sure about context handling at one place 🤔
@@ -0,0 +1,19 @@ | |||
package cli |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be fake_test.go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the _test.go
makes this file a test and not usable as source for another test.
At least, only changing the name breaks the unit test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case I'm missing something, I'll fix this in a following PR
retryErr = wait.ExponentialBackoffWithContext(ctx, | ||
backoff, | ||
func(ctx context.Context) (bool, error) { | ||
ctxWithTimeout, cancel := context.WithTimeout(ctx, timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this? 🤔 IIUC, the context we get here is the same which we pass to ExponentialBackoffWithContext. Why would we need to cancel that one when we leave this function. Even more, isn't that an issue when do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but maybe it's just too late for me to understand it completely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we need to cancel that one when we leave this function
Not when we leave the function, but when we are in the middle of the function (either in the retry or during a command call), and we want to stop it (e.g. NHC time out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not when we leave the function
but that's we do with the defer one line below, not? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mixed contexts here (no pun intended 😄)
The code is in a different place now, but my intention here is not to cancel the context, but to give the exec.CommandContext a timeout to run the fence agent command
b665a12
to
fee00a4
Compare
/retest |
/lgtm |
I'm unholding this PR since only contain Nits and it has an E2E fix which is relevant for other PRs. |
Run the fence agent command asynchronously in a dedicated goroutine on the same controller's container.
The goroutine is also responsible to update FAR status with the command outcome. For this reason two new Status Conditions have been added to take into account fence agents failures or timeouts.
The fence agent command has three new, optional, Spec values:
TODO
exec.Command
onlyupdateConditions
updateConditions
in executor