-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddl: batch check the constrains when we add a unique-index. #7132
Conversation
…up adding indices
ddl/index.go
Outdated
@@ -453,6 +453,8 @@ type indexRecord struct { | |||
handle int64 | |||
key []byte // It's used to lock a record. Record it to reduce the encoding time. | |||
vals []types.Datum // It's the index values. | |||
// skip indicate the index key is already exists, we should not add it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indicates that .....
ddl/index.go
Outdated
defaultVals []types.Datum // It's used to reduce the number of new slice. | ||
idxRecords []*indexRecord // It's used to reduce the number of new slice. | ||
rowMap map[int64]types.Datum // It's the index column values map. It is used to reduce the number of making map. | ||
defaultVals []types.Datum // It's used to reduce the number of new slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments for the following attribute are almost the same. We could use one comment for them. Such as:
"The following attributes are used to reduce memory allocation."
ddl/index.go
Outdated
defaultVals: make([]types.Datum, len(t.Cols())), | ||
rowMap: make(map[int64]types.Datum, len(colFieldMap)), | ||
} | ||
w.reAllocIdxKeyBufs(w.batchCnt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to reallocate it?
ddl/index.go
Outdated
} | ||
} | ||
// Constrains is already checked. | ||
w.sessCtx.GetSessionVars().StmtCtx.BatchCheck = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stmtCtx.BatchCheck = true
@shenli PTAL |
ddl/index.go
Outdated
defaultVals: make([]types.Datum, len(t.Cols())), | ||
rowMap: make(map[int64]types.Datum, len(colFieldMap)), | ||
} | ||
w.initBatchCheckBufs(w.batchCnt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why init here? If the index is not unique, it is a waste to init it.
|
||
// 1. unique-key is duplicate and the handle is equal, skip it. | ||
// 2. unique-key is duplicate and the handle is not equal, return duplicate error. | ||
// 3. non-unique-key is duplicate, skip it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we skip it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean which one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backfill indices only need to add the not exist index, if the index already exists, why we need to add it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say if there is a unique index (a), and if there are two rows (null)
, (null)
, then all the rows need to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it will be added, because the null value in unique-key is regarded as non-distinct key, so we will append the handle to key, so the twos (null) (null) will have the different key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a unit test case to eliminate your doubt.
@lamxTyler PTAL |
ddl/index.go
Outdated
// The index is already exists, we skip it, no needs to backfill it. | ||
// The following update, delete, insert on these rows, TiDB can handle it correctly. | ||
if idxRecord.skip { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is skip maybe cause the addedCount
wrong? see this PR : https://github.com/pingcap/tidb/pull/6980/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this skipped row will not affect addedCount, it is expected, but scanCount should increace.
… into speedup_creating_unique_key
ddl/index.go
Outdated
@@ -452,6 +452,8 @@ type indexRecord struct { | |||
handle int64 | |||
key []byte // It's used to lock a record. Record it to reduce the encoding time. | |||
vals []types.Datum // It's the index values. | |||
// skip indicates that the index key is already exists, we should not add it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to move the comment to the end of the next line.
Please resolve the conflicts. |
@jackysp PTAL |
Please fix CI and resolve the conflicts again. |
@jackysp @crazycs520 PTAL |
@lamxTyler PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
/rebuild |
/run-unit-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What have you changed? (mandatory)
Before this PR, we check if the key is duplicate every row, it will be much slower than adding a non-unique index. In this PR, before we create a unique index, we batch check the keys, and skip the keys that already exists.
I built a cluster using ansible on my virtual machine. And use importer to produce a table with 1 million rows. Then I try to add a unique index on it, with this PR and the master branch. I got the result:
master branch costs 7m25s, and this pr costs 1m41s to finish it. This PR has about 77.3% improvement.
This PR needs to cherry-pick to 2.0.
What is the type of the changes? (mandatory)
How has this PR been tested? (mandatory)
Exist tests.
Does this PR affect documentation (docs/docs-cn) update? (mandatory)
no
Does this PR affect tidb-ansible update? (mandatory)
no
Does this PR need to be added to the release notes? (mandatory)
YES:
-->
Refer to a related PR or issue link (optional)
Benchmark result if necessary (optional)
Add a few positive/negative examples (optional)