-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Create Backup Consumption Logic #542
Conversation
65d7655
to
e04b85f
Compare
Deploying with Cloudflare Pages
|
10e01fe
to
6425450
Compare
6425450
to
733d853
Compare
f886f87
to
c6b1e94
Compare
last_offset = record.offset | ||
if last_offset >= end_offset: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd compare the record offset to end offset in the loop and break immediately when reached. Now this could backup more than was decided as the end offset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is intentional, backing up more than the minimum is totally fine. Using a while
loop would mean that we perform the same check twice in a row, however, using the do
-while
loop means that we perform only as many checks as necessary, while allowing to back up as much as possible, but at least what was in the topic when we started.
c6b1e94
to
9b68d37
Compare
9b68d37
to
2d89696
Compare
The consumption logic is currently counting how many records it received in a single batch returned from poll, and when it is empty it concludes that the backup is successfully finished. However, there are meany reasons why a batch returned by poll is empty, especially with timeouts applied to it. A consequence of this is that a backup created at $t_1$ may contain more records than a backup created at $t_2$ (without any external changes to the topic content, e.g. compaction). To fix this we have to use offset watermarks. With them we can determine if we are done, or not. The patch now exposes the poll timeout, so that users can increase it in case they encounter issues, and it uses a longer default poll timeout to ensure that users are not going to see errors right away (increased from 1 second to 1 minute).
2d89696
to
08e89d7
Compare
The consumption logic is currently counting how many records it received in a single batch returned from poll, and when it is empty it concludes that the backup is successfully finished. However, there are meany reasons why a batch returned by poll is empty, especially with timeouts applied to it. A consequence of this is that a backup created at$t_1$ may contain more records than a backup created at $t_2$ (without any external changes to the topic content, e.g. compaction).
To fix this we have to use offset watermarks. With them we can determine if we are done, or not. The patch now exposes the poll timeout, so that users can increase it in case they encounter issues, and it uses a longer default poll timeout to ensure that users are not going to see errors right away (increased from 1 second to 1 minute).
Requires (and includes) #540 to be merged first.