-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make stale read and history read compatible with DDL #22427
Comments
/assign |
For |
To maintain the queue, there are 2 problems we need to solve:
For the second problem, I think the tidb-server could load the For the first problem, I think it can be solved in the following way:
In this way, during Though recording the |
One question: I think stale read is quite like a snapshot read, the latter will form a new So as regards stale read syntax, seems we can form a new |
PTAL @djshow832 @Yisaer |
Should be closed by #24285. |
Background
This is a subtask of #21094.
The executor and coprocessor always read the newest schema, even if in a staleness transaction or it's a history read. If a schema change happens after the specified timestamp, some cases may happen:
DDL without data reorganization
In this case, the schema change needs no data reorganization, which means only the metadata changes but not table data. Most of the DDL is in this case.
Since the table data format stays the same, the data can still be parsed with the newer schema.
E.g.
In this case, the result contains the latest table structure even if the data is older. This is acceptable in most cases because the user applies staleness transactions to reduce cross-region latency and release read hotspot rather than reading history data.
There may be some DDL that affects the read result but it doesn't occur to me for now.
DDL with data reorganization
In this case, the schema change needs to reorganize table data, which means some or all of the table data will be reformated.
So far there are only 3 kinds of such DDL:
E.g.
In these cases, some of these problems will occur:
Solutions
For the DDL without data reorganization, as the result is acceptable, we just need to declare in the document that the schema is always the latest.
For the DDL with data reorganization, there are some possible solutions:
E.g.
Implementations
Firstly, we need to collect the DDL info. Secondly, we need to look up the tables against the DDL in staleness transactions or history reads.
Collect the DDL which needs data reorganization
DDL info of each DDL needs to be cached in a list. The DDL info includes the schema version, the DDL type, and affected table ids.
schemaValidator.deltaSchemaInfos
is a similar list that contains recent schema changes. It is mainly used to validate that the schema of tables affected by one transaction is not changed during the transaction. SeeschemaValidator.isRelatedTablesChanged
.However, its capacity is 1024 by default and it contains all schema changes, not only those with data reorganization. So there may a possibility when the transaction is too old and the DDL info list runs out, just like what
schemaValidator.isRelatedTablesChanged
reports.Get the schema version for the start ts of the staleness transaction
We need to compare the time order of transaction start ts (or the snapshot time of a history read) with schema changes.
For normal transactions, it is done by comparing the schema version when the transaction starts (
TransactionContext.SchemaVersion
) with the schema version of each schema change, just likeschemaValidator.isRelatedTablesChanged
.However, for staleness transactions, the schema version recorded in the transaction context should be the one corresponding to the transaction start ts, rather than the one when the transaction really starts.
One way is to get the DDL job history from the metadata, like what
GetDDLJobs
does. In this way, we can get the start time for each schema version, but it needs to read the metadata on TiKV, which is slow. What's more, the start time is not accurate.Check the DDL info list when reading tables
Each time the staleness transaction or history read reads a table, check the DDL info list. If there exists any DDL that affected the table, report an error.
Just like validating the transaction scopes in local transactions, we can also validate the schema of tables in all operators that reading tables directly. For example, validate the tables in
RequestBuilder.Build
to coverTableReader
,IndexMergeReader
andIndexReader
.The text was updated successfully, but these errors were encountered: