Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: do not retry forever without making progress #62556

Open
miretskiy opened this issue Mar 24, 2021 · 3 comments
Open

changefeedccl: do not retry forever without making progress #62556

miretskiy opened this issue Mar 24, 2021 · 3 comments
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@miretskiy
Copy link
Contributor

miretskiy commented Mar 24, 2021

Changefeed will retry forever when sink is unavailable. This behavior might not be desirable
for multiple reasons:

  • Failures to write to sink are expensive; they result in the the dist flow process to be torn and recreated every 10 seconds.
  • Retrying forever is confusing to end users: the job is "running", yet nothing is happening (though recent changes to surface retries in running status might help).
  • Some sinks (s3) already have plenty of retires built into the client -- retrying (forever) might not add that much.

I think the fix might be fairly straightforward: retry limited number of times when we don't advance low watermark (i.e. we're not making forward progress).

Jira issue: CRDB-2768

@miretskiy miretskiy added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cdc Change Data Capture labels Mar 24, 2021
@ajwerner ajwerner changed the title Do not retry forever without making progress changefeedccl: do not retry forever without making progress Mar 25, 2021
@amruss
Copy link
Contributor

amruss commented Mar 29, 2021

Open questions: How do we alert the user when there is an action they need to take? What should our "retry limit" be? Should we increase the backoff?

@amruss
Copy link
Contributor

amruss commented Mar 29, 2021

Linking: #58077

@miretskiy
Copy link
Contributor Author

I think doing something as part of the issue is important since we now hold onto PTS record.
Retrying forever would be bad.

@miretskiy miretskiy self-assigned this Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

No branches or pull requests

3 participants