Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[STRMCMP-616] Use error retrying system for savepoint errors during deletion #87

Merged
merged 5 commits into from
Aug 23, 2019

Conversation

mwylde
Copy link
Contributor

@mwylde mwylde commented Aug 22, 2019

Currently, if a flinkapplication is being deleted with DeleteMode: Savepoint and the savepoint operation is failing, we will retry endlessly with no delay on the savepoint operation. This is because in that case, we don't return an error back to the error management system of the state machine.

This PR fixes that by properly returning an error for that situation. This causes us to do retry with exponential backoff instead of on every statemachine iteration.

It also should fix the "exceeded the maximum log length" integration test failures, which are caused by endless logging of the retries.

@@ -676,6 +677,8 @@ func (s *FlinkStateMachine) handleApplicationDeleting(ctx context.Context, app *
fmt.Sprintf("Failed to take savepoint %v", status.Operation.FailureCause))
// clear the trigger id so that we can try again
app.Spec.SavepointInfo.TriggerID = ""
return true, client.GetRetryableError(errors.New("failed to take savepoint"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a unit test ?

@mwylde
Copy link
Contributor Author

mwylde commented Aug 22, 2019

Updated with unit test and a config change to increase the maximum retry delay when running integration tests. This reduces the number of log lines in the integration test by ~1k.

@mwylde mwylde merged commit 7b52ad3 into master Aug 23, 2019
@mwylde mwylde deleted the micah_delete_errors branch August 23, 2019 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants