Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Create Backup Consumption Logic #542

Merged
merged 1 commit into from
Feb 22, 2023

Conversation

Fleshgrinder
Copy link
Contributor

@Fleshgrinder Fleshgrinder commented Feb 14, 2023

The consumption logic is currently counting how many records it received in a single batch returned from poll, and when it is empty it concludes that the backup is successfully finished. However, there are meany reasons why a batch returned by poll is empty, especially with timeouts applied to it. A consequence of this is that a backup created at $t_1$ may contain more records than a backup created at $t_2$ (without any external changes to the topic content, e.g. compaction).

To fix this we have to use offset watermarks. With them we can determine if we are done, or not. The patch now exposes the poll timeout, so that users can increase it in case they encounter issues, and it uses a longer default poll timeout to ensure that users are not going to see errors right away (increased from 1 second to 1 minute).


Requires (and includes) #540 to be merged first.

@Fleshgrinder Fleshgrinder self-assigned this Feb 14, 2023
@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Feb 15, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 08e89d7
Status: ✅  Deploy successful!
Preview URL: https://e31802c9.karapace.pages.dev
Branch Preview URL: https://fleshgrinder-backup-watermar.karapace.pages.dev

View logs

@Fleshgrinder Fleshgrinder marked this pull request as ready for review February 15, 2023 14:31
@Fleshgrinder Fleshgrinder requested review from a team as code owners February 15, 2023 14:31
@Fleshgrinder Fleshgrinder force-pushed the fleshgrinder/backup-watermarks branch 3 times, most recently from 10e01fe to 6425450 Compare February 15, 2023 15:21
karapace/schema_backup.py Outdated Show resolved Hide resolved
karapace/schema_backup.py Outdated Show resolved Hide resolved
@Fleshgrinder Fleshgrinder force-pushed the fleshgrinder/backup-watermarks branch 5 times, most recently from f886f87 to c6b1e94 Compare February 16, 2023 17:37
Comment on lines +402 to +404
last_offset = record.offset
if last_offset >= end_offset:
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd compare the record offset to end offset in the loop and break immediately when reached. Now this could backup more than was decided as the end offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is intentional, backing up more than the minimum is totally fine. Using a while loop would mean that we perform the same check twice in a row, however, using the do-while loop means that we perform only as many checks as necessary, while allowing to back up as much as possible, but at least what was in the topic when we started.

The consumption logic is currently counting how many records it received in a
single batch returned from poll, and when it is empty it concludes that the
backup is successfully finished. However, there are meany reasons why a batch
returned by poll is empty, especially with timeouts applied to it. A consequence
of this is that a backup created at $t_1$ may contain more records than a backup
created at $t_2$ (without any external changes to the topic content, e.g.
compaction).

To fix this we have to use offset watermarks. With them we can determine if we
are done, or not. The patch now exposes the poll timeout, so that users can
increase it in case they encounter issues, and it uses a longer default poll
timeout to ensure that users are not going to see errors right away (increased
from 1 second to 1 minute).
@jjaakola-aiven jjaakola-aiven merged commit 86f7fa1 into main Feb 22, 2023
@jjaakola-aiven jjaakola-aiven deleted the fleshgrinder/backup-watermarks branch February 22, 2023 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants