Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: batch check the constrains when we add a unique-index. #7132

Merged
merged 19 commits into from
Aug 6, 2018
Merged

ddl: batch check the constrains when we add a unique-index. #7132

merged 19 commits into from
Aug 6, 2018

Conversation

winkyao
Copy link
Contributor

@winkyao winkyao commented Jul 23, 2018

What have you changed? (mandatory)

Before this PR, we check if the key is duplicate every row, it will be much slower than adding a non-unique index. In this PR, before we create a unique index, we batch check the keys, and skip the keys that already exists.

I built a cluster using ansible on my virtual machine. And use importer to produce a table with 1 million rows. Then I try to add a unique index on it, with this PR and the master branch. I got the result:

2018/07/23 21:47:25.738 adapter.go:363: [warning] [SLOW_QUERY] cost_time:7m25.495286579s succ:true con:17 user:[email protected] txn_start_ts:401697197322665986 database:test sql:alter table t add unique index b(b)

2018/07/23 22:01:42.412 adapter.go:363: [warning] [SLOW_QUERY] cost_time:1m41.406208324s succ:true con:1 user:[email protected] txn_start_ts:401697512092336130 database:test sql:alter table t add unique index b(b)

master branch costs 7m25s, and this pr costs 1m41s to finish it. This PR has about 77.3% improvement.

This PR needs to cherry-pick to 2.0.

What is the type of the changes? (mandatory)

  • Improvement (non-breaking change which is an improvement to an existing feature)

How has this PR been tested? (mandatory)

Exist tests.

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

no

Does this PR affect tidb-ansible update? (mandatory)

no

Does this PR need to be added to the release notes? (mandatory)

YES:

release note:
Spead up adding unique index by batch checking the constraints.

-->

Refer to a related PR or issue link (optional)

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

@winkyao winkyao added component/DDL-need-LGT3 release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jul 23, 2018
ddl/index.go Outdated
@@ -453,6 +453,8 @@ type indexRecord struct {
handle int64
key []byte // It's used to lock a record. Record it to reduce the encoding time.
vals []types.Datum // It's the index values.
// skip indicate the index key is already exists, we should not add it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indicates that .....

ddl/index.go Outdated
defaultVals []types.Datum // It's used to reduce the number of new slice.
idxRecords []*indexRecord // It's used to reduce the number of new slice.
rowMap map[int64]types.Datum // It's the index column values map. It is used to reduce the number of making map.
defaultVals []types.Datum // It's used to reduce the number of new slice.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments for the following attribute are almost the same. We could use one comment for them. Such as:
"The following attributes are used to reduce memory allocation."

ddl/index.go Outdated
defaultVals: make([]types.Datum, len(t.Cols())),
rowMap: make(map[int64]types.Datum, len(colFieldMap)),
}
w.reAllocIdxKeyBufs(w.batchCnt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to reallocate it?

ddl/index.go Outdated
}
}
// Constrains is already checked.
w.sessCtx.GetSessionVars().StmtCtx.BatchCheck = true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stmtCtx.BatchCheck = true

@winkyao
Copy link
Contributor Author

winkyao commented Jul 24, 2018

@shenli PTAL

@winkyao
Copy link
Contributor Author

winkyao commented Jul 24, 2018

@crazycs520 @zimulala @ciscoxll PTAL

ddl/index.go Outdated
defaultVals: make([]types.Datum, len(t.Cols())),
rowMap: make(map[int64]types.Datum, len(colFieldMap)),
}
w.initBatchCheckBufs(w.batchCnt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why init here? If the index is not unique, it is a waste to init it.


// 1. unique-key is duplicate and the handle is equal, skip it.
// 2. unique-key is duplicate and the handle is not equal, return duplicate error.
// 3. non-unique-key is duplicate, skip it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can we skip it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean which one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backfill indices only need to add the not exist index, if the index already exists, why we need to add it again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say if there is a unique index (a), and if there are two rows (null), (null), then all the rows need to be added.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it will be added, because the null value in unique-key is regarded as non-distinct key, so we will append the handle to key, so the twos (null) (null) will have the different key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a unit test case to eliminate your doubt.

@winkyao
Copy link
Contributor Author

winkyao commented Jul 24, 2018

@lamxTyler PTAL

ddl/index.go Outdated
// The index is already exists, we skip it, no needs to backfill it.
// The following update, delete, insert on these rows, TiDB can handle it correctly.
if idxRecord.skip {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is skip maybe cause the addedCount wrong? see this PR : https://github.com/pingcap/tidb/pull/6980/files

Copy link
Contributor Author

@winkyao winkyao Jul 31, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this skipped row will not affect addedCount, it is expected, but scanCount should increace.

ddl/index.go Outdated
@@ -452,6 +452,8 @@ type indexRecord struct {
handle int64
key []byte // It's used to lock a record. Record it to reduce the encoding time.
vals []types.Datum // It's the index values.
// skip indicates that the index key is already exists, we should not add it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to move the comment to the end of the next line.

@jackysp
Copy link
Member

jackysp commented Jul 31, 2018

Please resolve the conflicts.

@winkyao
Copy link
Contributor Author

winkyao commented Aug 1, 2018

@jackysp PTAL

@jackysp
Copy link
Member

jackysp commented Aug 1, 2018

Please fix CI and resolve the conflicts again.

@winkyao
Copy link
Contributor Author

winkyao commented Aug 1, 2018

@jackysp @crazycs520 PTAL

@winkyao
Copy link
Contributor Author

winkyao commented Aug 1, 2018

@lamxTyler PTAL

Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jackysp jackysp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@winkyao
Copy link
Contributor Author

winkyao commented Aug 2, 2018

/run-all-tests

@winkyao
Copy link
Contributor Author

winkyao commented Aug 2, 2018

/rebuild

@winkyao
Copy link
Contributor Author

winkyao commented Aug 2, 2018

/run-unit-test

Copy link
Contributor

@zimulala zimulala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zimulala zimulala added the status/LGT3 The PR has already had 3 LGTM. label Aug 6, 2018
@winkyao winkyao merged commit 326baac into pingcap:master Aug 6, 2018
@winkyao winkyao deleted the speedup_creating_unique_key branch August 6, 2018 12:39
@you06 you06 added the sig/sql-infra SIG: SQL Infra label Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/sql-infra SIG: SQL Infra status/LGT3 The PR has already had 3 LGTM.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants