You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whenever VTOrc (or anyone else (manually, vtctld, etc)) fixes a replica to repair replication-related failures, it ends up resetting all the replica logs.
This in of itself, isn't a huge deal since the I/O thread can basically just read the logs again. But if the users are using the replication reporter (--enable_replication_reporter) on the vttablet, then this becomes an issue. The replication reporter uses the Seconds_behind_source as the source of its information to calculate the replication lag. According to the MySQL docs for this field -
In essence, this field measures the time difference in seconds between the replication SQL (applier) thread and the replication I/O (receiver) thread. If the network connection between source and replica is fast, the replication receiver thread is very close to the source, so this field is a good approximation of how late the replication applier thread is compared to the source. If the network is slow, this is not a good approximation; the replication applier thread may quite often be caught up with the slow-reading replication receiver thread, so Seconds_Behind_Source often shows a value of 0, even if the replication receiver thread is late compared to the source. In other words, this column is useful only for fast networks.
Resetting the relay logs essentially resets the I/O thread too which can lead to incorrectly reporting the replication lag.
Reproduction Steps
Run a cluster
stop replication on a tablet and see that after VTOrc repairs it, the relay logs are gone.
Binary Version
main
Operating System and Environment details
all
Log Fragments
No response
The text was updated successfully, but these errors were encountered:
GuptaManan100
changed the title
Bug Report: Each VTOrc repair resets the replica logs
Bug Report: Each replication repair resets the replica logs
Jun 8, 2023
This can be more serious than just a reporting issue. If it so happens that vtorc performs replication repair on all replicas just before a primary failure, we can end up losing data upon ERS.
We need to backport any fix we come up with all the way back to when we introduced the RESET during replication repair. That was in #10943.
Overview of the Issue
Whenever VTOrc (or anyone else (manually, vtctld, etc)) fixes a replica to repair replication-related failures, it ends up resetting all the replica logs.
This in of itself, isn't a huge deal since the I/O thread can basically just read the logs again. But if the users are using the replication reporter (
--enable_replication_reporter
) on the vttablet, then this becomes an issue. The replication reporter uses theSeconds_behind_source
as the source of its information to calculate the replication lag. According to the MySQL docs for this field -Resetting the relay logs essentially resets the I/O thread too which can lead to incorrectly reporting the replication lag.
Reproduction Steps
Binary Version
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: