ddl: batch check the constrains when we add a unique-index. #7132

winkyao · 2018-07-23T14:27:28Z

What have you changed? (mandatory)

Before this PR, we check if the key is duplicate every row, it will be much slower than adding a non-unique index. In this PR, before we create a unique index, we batch check the keys, and skip the keys that already exists.

I built a cluster using ansible on my virtual machine. And use importer to produce a table with 1 million rows. Then I try to add a unique index on it, with this PR and the master branch. I got the result:

2018/07/23 21:47:25.738 adapter.go:363: [warning] [SLOW_QUERY] cost_time:7m25.495286579s succ:true con:17 user:[email protected] txn_start_ts:401697197322665986 database:test sql:alter table t add unique index b(b)

2018/07/23 22:01:42.412 adapter.go:363: [warning] [SLOW_QUERY] cost_time:1m41.406208324s succ:true con:1 user:[email protected] txn_start_ts:401697512092336130 database:test sql:alter table t add unique index b(b)

master branch costs 7m25s, and this pr costs 1m41s to finish it. This PR has about 77.3% improvement.

This PR needs to cherry-pick to 2.0.

What is the type of the changes? (mandatory)

Improvement (non-breaking change which is an improvement to an existing feature)

How has this PR been tested? (mandatory)

Exist tests.

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

no

Does this PR affect tidb-ansible update? (mandatory)

no

Does this PR need to be added to the release notes? (mandatory)

YES:

release note:
Spead up adding unique index by batch checking the constraints.

-->

Refer to a related PR or issue link (optional)

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

…up adding indices

…unique_key

shenli · 2018-07-23T16:22:19Z

ddl/index.go

@@ -453,6 +453,8 @@ type indexRecord struct {
 	handle int64
 	key    []byte        // It's used to lock a record. Record it to reduce the encoding time.
 	vals   []types.Datum // It's the index values.
+	// skip indicate the index key is already exists, we should not add it.


indicates that .....

shenli · 2018-07-23T16:24:03Z

ddl/index.go

-	defaultVals []types.Datum         // It's used to reduce the number of new slice.
-	idxRecords  []*indexRecord        // It's used to reduce the number of new slice.
-	rowMap      map[int64]types.Datum // It's the index column values map. It is used to reduce the number of making map.
+	defaultVals        []types.Datum         // It's used to reduce the number of new slice.


The comments for the following attribute are almost the same. We could use one comment for them. Such as:
"The following attributes are used to reduce memory allocation."

shenli · 2018-07-23T16:24:49Z

ddl/index.go

 		defaultVals: make([]types.Datum, len(t.Cols())),
 		rowMap:      make(map[int64]types.Datum, len(colFieldMap)),
 	}
+	w.reAllocIdxKeyBufs(w.batchCnt)


Why do we need to reallocate it?

shenli · 2018-07-23T16:29:25Z

ddl/index.go

+		}
+	}
+	// Constrains is already checked.
+	w.sessCtx.GetSessionVars().StmtCtx.BatchCheck = true


stmtCtx.BatchCheck = true

winkyao · 2018-07-24T01:50:37Z

@shenli PTAL

winkyao · 2018-07-24T01:50:55Z

@crazycs520 @zimulala @ciscoxll PTAL

alivxxx · 2018-07-24T05:44:02Z

ddl/index.go

 		defaultVals: make([]types.Datum, len(t.Cols())),
 		rowMap:      make(map[int64]types.Datum, len(colFieldMap)),
 	}
+	w.initBatchCheckBufs(w.batchCnt)


Why init here? If the index is not unique, it is a waste to init it.

alivxxx · 2018-07-24T06:06:38Z

ddl/index.go

+
+	// 1. unique-key is duplicate and the handle is equal, skip it.
+	// 2. unique-key is duplicate and the handle is not equal, return duplicate error.
+	// 3. non-unique-key is duplicate, skip it.


Why can we skip it?

You mean which one?

Backfill indices only need to add the not exist index, if the index already exists, why we need to add it again?

Say if there is a unique index (a), and if there are two rows (null), (null), then all the rows need to be added.

Actually, it will be added, because the null value in unique-key is regarded as non-distinct key, so we will append the handle to key, so the twos (null) (null) will have the different key.

I will add a unit test case to eliminate your doubt.

winkyao · 2018-07-24T06:45:22Z

@lamxTyler PTAL

crazycs520 · 2018-07-24T07:16:21Z

ddl/index.go

+			// The index is already exists, we skip it, no needs to backfill it.
+			// The following update, delete, insert on these rows, TiDB can handle it correctly.
+			if idxRecord.skip {
+				continue


Is skip maybe cause the addedCount wrong? see this PR : https://github.com/pingcap/tidb/pull/6980/files

No, this skipped row will not affect addedCount, it is expected, but scanCount should increace.

… into speedup_creating_unique_key

…unique_key

jackysp · 2018-07-31T05:12:16Z

ddl/index.go

@@ -452,6 +452,8 @@ type indexRecord struct {
 	handle int64
 	key    []byte        // It's used to lock a record. Record it to reduce the encoding time.
 	vals   []types.Datum // It's the index values.
+	// skip indicates that the index key is already exists, we should not add it.


It's better to move the comment to the end of the next line.

jackysp · 2018-07-31T05:12:37Z

Please resolve the conflicts.

…unique_key

winkyao · 2018-08-01T03:13:34Z

@jackysp PTAL

jackysp · 2018-08-01T03:29:54Z

Please fix CI and resolve the conflicts again.

…unique_key

winkyao · 2018-08-01T09:11:26Z

@jackysp @crazycs520 PTAL

winkyao · 2018-08-01T09:12:49Z

@lamxTyler PTAL

alivxxx

LGTM

jackysp

LGTM

winkyao · 2018-08-02T11:08:10Z

/run-all-tests

winkyao · 2018-08-02T11:08:32Z

/rebuild

winkyao · 2018-08-02T13:17:24Z

/run-unit-test

zimulala

LGTM

…7562)

ddl: batch check the constrains when we add a unique-index. To spead …

a1ab0ca

…up adding indices

winkyao added component/DDL-need-LGT3 release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jul 23, 2018

winkyao added 2 commits July 23, 2018 22:40

Merge remote-tracking branch 'upstream/master' into speedup_creating_…

2ba5b47

…unique_key

fix ci

85ee474

shenli reviewed Jul 23, 2018

View reviewed changes

winkyao added 2 commits July 24, 2018 09:45

address comments

add465a

only backfill not found key into batchVals map

c683997

alivxxx reviewed Jul 24, 2018

View reviewed changes

winkyao added 2 commits July 24, 2018 14:44

address coments

9ae825e

Merge branch 'master' into speedup_creating_unique_key

8d44350

crazycs520 reviewed Jul 24, 2018

View reviewed changes

winkyao added 3 commits July 24, 2018 15:29

add test

4d737df

Merge branch 'speedup_creating_unique_key' of github.com:winkyao/tidb…

2c5898d

… into speedup_creating_unique_key

Merge remote-tracking branch 'upstream/master' into speedup_creating_…

a03ae8c

…unique_key

jackysp reviewed Jul 31, 2018

View reviewed changes

winkyao added 2 commits August 1, 2018 11:10

Merge remote-tracking branch 'upstream/master' into speedup_creating_…

bcc2ac3

…unique_key

address comments

37a659d

winkyao added 2 commits August 1, 2018 17:07

Merge remote-tracking branch 'upstream/master' into speedup_creating_…

a8783d9

…unique_key

increace scanCount if the row is skipped

a46f955

alivxxx reviewed Aug 2, 2018

View reviewed changes

jackysp approved these changes Aug 2, 2018

View reviewed changes

winkyao added 2 commits August 2, 2018 17:11

Merge branch 'master' into speedup_creating_unique_key

bed758a

Merge branch 'master' into speedup_creating_unique_key

e1e199b

shenli added the status/all tests passed label Aug 3, 2018

Merge branch 'master' into speedup_creating_unique_key

755f1fa

zimulala approved these changes Aug 6, 2018

View reviewed changes

Merge branch 'master' into speedup_creating_unique_key

ce55c6e

zimulala added the status/LGT3 The PR has already had 3 LGTM. label Aug 6, 2018

Merge branch 'master' into speedup_creating_unique_key

1190ca5

winkyao merged commit 326baac into pingcap:master Aug 6, 2018

winkyao deleted the speedup_creating_unique_key branch August 6, 2018 12:39

winkyao mentioned this pull request Aug 31, 2018

ddl: batch check the constrains when we add a unique-index. (#7132) #7562

Merged

winkyao added a commit that referenced this pull request Sep 6, 2018

ddl: batch check the constrains when we add a unique-index. (#7132) (#…

d0a8faa

…7562)

you06 added the sig/sql-infra SIG: SQL Infra label Mar 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ddl: batch check the constrains when we add a unique-index. #7132

ddl: batch check the constrains when we add a unique-index. #7132

winkyao commented Jul 23, 2018 •

edited by ngaut

Loading

shenli Jul 23, 2018

shenli Jul 23, 2018

shenli Jul 23, 2018

shenli Jul 23, 2018

winkyao commented Jul 24, 2018

winkyao commented Jul 24, 2018

alivxxx Jul 24, 2018

alivxxx Jul 24, 2018

winkyao Jul 24, 2018

winkyao Jul 24, 2018

alivxxx Jul 24, 2018

winkyao Jul 24, 2018

winkyao Jul 24, 2018

winkyao commented Jul 24, 2018

crazycs520 Jul 24, 2018

winkyao Jul 31, 2018 •

edited

Loading

jackysp Jul 31, 2018

jackysp commented Jul 31, 2018

winkyao commented Aug 1, 2018

jackysp commented Aug 1, 2018

winkyao commented Aug 1, 2018

winkyao commented Aug 1, 2018

alivxxx left a comment

jackysp left a comment

winkyao commented Aug 2, 2018

winkyao commented Aug 2, 2018

winkyao commented Aug 2, 2018

zimulala left a comment

ddl: batch check the constrains when we add a unique-index. #7132

ddl: batch check the constrains when we add a unique-index. #7132

Conversation

winkyao commented Jul 23, 2018 • edited by ngaut Loading

What have you changed? (mandatory)

What is the type of the changes? (mandatory)

How has this PR been tested? (mandatory)

Does this PR affect documentation (docs/docs-cn) update? (mandatory)

Does this PR affect tidb-ansible update? (mandatory)

Does this PR need to be added to the release notes? (mandatory)

Refer to a related PR or issue link (optional)

Benchmark result if necessary (optional)

Add a few positive/negative examples (optional)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

winkyao commented Jul 24, 2018

winkyao commented Jul 24, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

winkyao commented Jul 24, 2018

Choose a reason for hiding this comment

winkyao Jul 31, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jackysp commented Jul 31, 2018

winkyao commented Aug 1, 2018

jackysp commented Aug 1, 2018

winkyao commented Aug 1, 2018

winkyao commented Aug 1, 2018

alivxxx left a comment

Choose a reason for hiding this comment

jackysp left a comment

Choose a reason for hiding this comment

winkyao commented Aug 2, 2018

winkyao commented Aug 2, 2018

winkyao commented Aug 2, 2018

zimulala left a comment

Choose a reason for hiding this comment

winkyao commented Jul 23, 2018 •

edited by ngaut

Loading

winkyao Jul 31, 2018 •

edited

Loading