source-mysql: Detect binlog offset wraparound #2151
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
In several places, MySQL represents binlog offsets within a file as a 32-bit unsigned integer. Most notably for our purposes are the binlog event header
log_pos
field, and the offset argument to theCOM_BINLOG_DUMP
command.The upshot of this is that we can't necessarily trust the offset to be correct when a file grows past 4GB, and even if we tracked the "full offset" ourselves we wouldn't be able to resume from there after a connector restart.
This normally isn't an issue because binlog files are never supposed to grow that large. The system setting
max_binlog_size
which governs the point after which the file is rotated has a maximum possible value of just 1GB. Problem is, that's a soft limit and it's possible to force MySQL to stuff arbitrarily large amounts of data into a single file. So we need to handle that situation as gracefully as possible.This commit implements that handling. It detects binlog offset overflow whenever an event's
log_pos
header value is smaller than the prior cursor position (this is reliable because there's also a 1GB cap on the size of any single event, and unlike the binlog size setting this one's actually a hard maximum), and once that occurs an "offset overflow" state flag is set which prevents us from emitting any further checkpoints until after the next binlog rotation.However there is one other place where we use binlog offsets, and that's as part of the
/_meta/source/cursor
field. This field is used as the fallback collection key for keyless tables, so it's actually kind of important that it be basically correct, though it's actually sufficient for it to be properly ordered and unique. We handle this by maintaining a u64 "estimated offset" which is advanced based on event sizes instead oflog_pos
values after offset overflow occurs within the current file.It's not exactly feasible to reproduce the edge case this fixes on demand within the confines of a CI build, so there is no new test case accompanying these changes. We'll have to content ourselves with CI tests showing this doesn't break anything when overflow doesn't occur, and the real test will come when this happens again in production. Which we can tell because there will be a warning message logged when it happens.
Workflow steps:
Nothing needs to be done, this just fixes a rare edge case which could cause captures to get stuck with an incorrect resume offset in their state checkpoint.
This change is