Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

stop/pause until reached the end of a transaction #1095

Closed
csuzhangxc opened this issue Sep 24, 2020 · 7 comments · Fixed by #1928
Closed

stop/pause until reached the end of a transaction #1095

csuzhangxc opened this issue Sep 24, 2020 · 7 comments · Fixed by #1928
Labels
help wanted This issue wanted some help from contributor type/feature-request This issue is a feature request

Comments

@csuzhangxc
Copy link
Member

Feature Request

Is your feature request related to a problem? Please describe:

DM split transactions from upstream MySQL into rows and re-aggregate rows into a new transaction as a batch.

When stop-task or pause-task, maybe only a part of rows in an upstream MySQL transaction have been committed into the downstream TiDB, in other words, the original transaction is broken in TiDB after that.

Describe the feature you'd like:

stop/pause the task until reached the end of a transaction for start-task/pause-task or shutdown the DM-worker process normally.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

@csuzhangxc csuzhangxc added type/feature-request This issue is a feature request help wanted This issue wanted some help from contributor labels Sep 24, 2020
@lance6716
Copy link
Collaborator

lance6716 commented Oct 15, 2020

another way is reverting to last consistent point (maybe recorded in checkpoint) if TiDB could flashback table

pingcap/tidb#20302

@lichunzhu
Copy link
Contributor

Maybe we can refer to tidb-binlog's logic:

  1. Don't send any sqls later than the current transation.
  2. Close syncer until all sqls are replicated to downstream.
  3. Add a boolean column synced in downstream checkpoint to notify whether DM really stops at a certain transaction.

https://github.com/pingcap/tidb-binlog/blob/v4.0.13/drainer/syncer.go#L484

@okJiang
Copy link
Member

okJiang commented Jul 28, 2021

How about saving to the end of the transaction(last xid event) directly every time saveTablePoint?

@okJiang
Copy link
Member

okJiang commented Jul 28, 2021

How about saving to the end of the transaction(last xid event) directly every time saveTablePoint?

Since this cannot guarantee the consistency of upstream and downstream data, we choose to delay syncer when stop/pause until the end of the transaction, and then flushCheckPoint.

@okJiang
Copy link
Member

okJiang commented Jul 28, 2021

Initial idea: Delay when syncer exits and job closes.

s.closeJobChans()

  • Wait for the arrival of XIDEvent before closing the job channel.

Steps:

  1. Before closing the job channel, set Syncer.waitXIDType = waiting. Then wait Syncer.waitXIDType = waitComplete here.
  2. After setting Syncer.waitXIDType = waiting, Syncer is running normaly until addJob encounter a XIDEvent.
  3. After encountering a XIDEvent,
    a. set Syncer.waitXIDType = waitComplete
    b. stop addJob
    c. continue close jobChan in 1
  4. The job in jobQueue is continue to be executed here. (by add executeSQLs())

    dm/syncer/syncer.go

    Lines 1322 to 1326 in 11fb5a8

    case sqlJob, ok := <-jobChan:
    metrics.QueueSizeGauge.WithLabelValues(s.cfg.Name, queueBucket, s.cfg.SourceID).Set(float64(len(jobChan)))
    if !ok {
    return
    }

waitXIDType:

type waitXIDType int

const (
    noWait waitXIDType = iota
    waiting
    waitComplete
)

PTAL @lance6716 @lichunzhu

@lance6716
Copy link
Collaborator

there're so many states of syncer: normal replication, sharing re-sync, handle-error injected some SQL, check whether to turn off safe mode, and your waiting xid.

could you use a more clear way to express above states and their transition? maybe a state machine

@okJiang
Copy link
Member

okJiang commented Jul 28, 2021

there're so many states of syncer: normal replication, sharing re-sync, handle-error injected some SQL, check whether to turn off safe mode, and your waiting xid.

could you use a more clear way to express above states and their transition? maybe a state machine

I'm afraid I can't fully understand all the above states in a short time

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
help wanted This issue wanted some help from contributor type/feature-request This issue is a feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants