Bug Report: VReplication lag is not updated when vplayer is throttled #16575
Labels
Component: Observability
Pull requests that touch tracing/metrics/monitoring
Component: VReplication
Type: Bug
Overview of the Issue
There is a for loop in the
vplayer
— which applies streamed binlog events from thevstreamer
— where we process events and as we do we update the vreplication lag:vitess/go/vt/vttablet/tabletmanager/vreplication/vplayer.go
Lines 485 to 574 in bf0c5f8
If the
vplayer
is throttled for some time, however, then we are stuck at the top of that for loop and never make it to the bottom of it where we update the lag value based on the just processed events:vitess/go/vt/vttablet/tabletmanager/vreplication/vplayer.go
Lines 485 to 493 in bf0c5f8
Because we're not processing events for however long we're fully throttled, which is indefinite, we're not updating the vreplication lag. Let's say that the last time we did process an event the lag was 0 seconds. And let's say we're then fully throttled, and not able to process anymore events, for the next 15 minutes... the system and operator is not aware of the impending and growing vreplication lag and suddenly the value shoots up from 0 seconds to 900 seconds.
This is obviously wrong. It can lead to only becoming aware of the issue once it's a bigger problem — if made aware immediately you may want to explicitly lessen the throttling altogether or for vreplication or more specifically the
vplayer
— or cause unnecessary concern as the lag unexpectedly fluctuates wildly (perhaps you really do want vreplication to be deferred/throttled).We currently have code in place which estimates the vreplication lag when we're not receiving any events from the
vstreamer
(perhaps we're not able to communicate or perhaps the sender/vstreamer
is throttled):vitess/go/vt/vttablet/tabletmanager/vreplication/vplayer.go
Lines 499 to 505 in bf0c5f8
We also need to do that when we're throttled. It may be as simple as this:
Reproduction Steps
End result on
main
:End result with the proposed patch:
Binary Version
Operating System and Environment details
Log Fragments
No response
The text was updated successfully, but these errors were encountered: