executor: support fast analyze. #9973

lzmhhh123 · 2019-04-01T05:58:24Z

What problem does this PR solve?

Support fast analyze.

What is changed and how it works?

We random the keys in each region to get samples instead of scanning all regions.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

Code changes

Has exported function/method change
Has exported variable/fields change
Has persistent data change

Side effects

Possible performance regression
Increased code complexity

Related changes

Need to update the documentation
Need to be included in the release note

executor/analyze.go

codecov · 2019-04-02T04:55:18Z

Codecov Report

❗ No coverage uploaded for pull request base (master@abeddab). Click here to learn what that means.
The diff coverage is 60.4863%.

@@             Coverage Diff             @@
##             master      #9973   +/-   ##
===========================================
  Coverage          ?   77.9428%           
===========================================
  Files             ?        405           
  Lines             ?      82540           
  Branches          ?          0           
===========================================
  Hits              ?      64334           
  Misses            ?      13445           
  Partials          ?       4761

store/tikv/tikvrpc/tikvrpc.go

store/tikv/client.go

planner/core/planbuilder.go

executor/analyze.go

lzmhhh123 · 2019-04-04T03:09:26Z

This PR is too large to review. I'll split it by:

support debug PB in tikv client. store/tikv: support debug PB in client. #10038
support fast analyze session control variable. sessionctx: support fast analyze session control variable. #10039
support fast analyze in planner and executor builder. planner, executor: support fast analyze in planner and executor's builder. #10040
support fast sample for fast analyze. executor: support fast sample for fast analyze #10214
support build stats for fast analyze. executor: support building stats for fast analyze. #10258

executor/analyze.go

lzmhhh123 · 2019-04-18T05:17:45Z

Test by TPC-H factor=50. Table lineitem. 300M rows in total.

Normal analyze tasks 7min 54sec.
Fast analyze tasks 6sec.

lzmhhh123 · 2019-04-18T05:20:26Z

PTAL. @qw4990 @winoros

winoros · 2019-04-18T05:24:05Z

Wonderful result!
300G rows is a little confusing. I think 60 million rows is easier to understand.
BTW what's the quality of the histogram result compared with the normal analyze's?

lzmhhh123 · 2019-04-18T06:48:16Z

@winoros Now, the number of rows regards the same rows with a different snapshot as the different rows. So the total count for a column is not collect. The problem will be fixed at the next PR——build stats info for fast analyze.

executor/analyze.go

qw4990 · 2019-04-18T12:03:16Z

executor/analyze.go

+
+		keys := make([]kv.Key, 0, task.SampSize)
+		for i := 0; i < int(task.SampSize); i++ {
+			randKey := rander.Int63n(maxRowID-minRowID) + minRowID


Since rand.Intn(0) will result in panic, do we need to check maxRowID-minRowID > 0 here?

qw4990 · 2019-04-18T12:33:49Z

executor/analyze.go

+		if collectors[0].Samples[samplePos] == nil {
+			collectors[0].Samples[samplePos] = &statistics.SampleItem{}
+		}
+		collectors[0].Samples[samplePos].Ordinal = int(samplePos)


Ordinal represents this item's relative order of physical position among samples.
For example, if there are two regions, they have condition that region1 < region2.
And you sample item1 from region1 and item2, item3 from region2 and they have condition that item1.key < item2.key < item3.key.
Then you have to meet condition that item1.Ordinal < item2.Ordinal < item3.Ordinal.
But in this implementation, item2.Ordinal < item3.Ordinal < item1.Ordinal can happen.
@eurekaka PTAL and please help to confirm if this problem exist?

zz-jason · 2019-04-18T07:05:34Z

executor/analyze.go

+	for buildCnt := 0; buildCnt < 5; buildCnt++ {
+		needRebuild, err := e.buildSampTask()
+		if err != nil {
+			return nil, nil, errors.Trace(err)


no need to call errors.Trace() anymore.

zz-jason · 2019-04-18T07:22:35Z

executor/analyze.go

+		go e.getSampRegionsRowCount(bo, &needRebuildForRoutine[i], &errs[i], &sampTasksForRoutine[i])
+	}
+
+	store, _ := e.ctx.GetStore().(tikv.Storage)


should we return error if the store is not tikv.Storage?

This check has been done at the planner.

zz-jason · 2019-04-19T05:10:58Z

executor/analyze.go

 	scanTasks       []*tikv.KeyLocation
 }

+func (e *AnalyzeFastExec) getSampRegionsRowCount(bo *tikv.Backoffer, needRebuild *bool, err *error, sampTasks *[]*AnalyzeFastTask) {


can we use another function signature? for example:

func (e *AnalyzeFastExec) getSampRegionsRowCount(bo *tikv.Backoffer) (needRebuild bool, sampTasks []*AnalyzeFastTask, err error) {

lzmhhh123 · 2019-04-28T07:27:00Z

All the sub-PRs have been merged.

lzmhhh123 added 5 commits March 26, 2019 13:53

ci

38eb623

ci

f6ee210

ci

f0105d0

Merge branch 'dev/plug_in_debug_pb' into dev/fast_analyze

f05cf4a

ci

586cf13

lzmhhh123 added status/WIP sig/execution SIG execution type/new-feature labels Apr 1, 2019

shenli reviewed Apr 1, 2019

View reviewed changes

executor/analyze.go Show resolved Hide resolved

lzmhhh123 added 3 commits April 1, 2019 19:00

improve

4cd7147

debug

5c2e687

improve

289268b

add some TODOs

957d0a7

lzmhhh123 marked this pull request as ready for review April 2, 2019 05:00

debug

86ea376

alivxxx reviewed Apr 2, 2019

View reviewed changes

lzmhhh123 added 3 commits April 2, 2019 17:21

address comments

6763c80

address comments

f412d13

add idx collector

e91daaa

lzmhhh123 removed the status/WIP label Apr 3, 2019

qw4990 reviewed Apr 3, 2019

View reviewed changes

executor/analyze.go Outdated Show resolved Hide resolved

qw4990 reviewed Apr 3, 2019

View reviewed changes

executor/analyze.go Show resolved Hide resolved

qw4990 reviewed Apr 3, 2019

View reviewed changes

executor/analyze.go Outdated Show resolved Hide resolved

ci

0e9640c

This was referenced Apr 4, 2019

store/tikv: support debug PB in client. #10038

Merged

sessionctx: support fast analyze session control variable. #10039

Merged

planner, executor: support fast analyze in planner and executor's builder. #10040

Merged

ci

925832f

lzmhhh123 commented Apr 17, 2019

View reviewed changes

executor/analyze.go Outdated Show resolved Hide resolved

lzmhhh123 added 6 commits April 17, 2019 20:14

fix data race

5472c8a

fix

544f215

fix

33aa77e

debug

d4f7684

debug

9cb5128

debug

6b078b5

erjiaqing and others added 4 commits April 18, 2019 14:14

upd

ba40c91

some rename

c07b72c

fix

42d32af

improve

250d6d6

alivxxx reviewed Apr 18, 2019

View reviewed changes

executor/analyze.go Outdated Show resolved Hide resolved

zz-jason reviewed Apr 18, 2019

View reviewed changes

executor/analyze.go Show resolved Hide resolved

lzmhhh123 added 2 commits April 18, 2019 15:07

fix

2d59027

fix test

0a76db7

qw4990 reviewed Apr 18, 2019

View reviewed changes

zz-jason reviewed Apr 19, 2019

View reviewed changes

lzmhhh123 added 3 commits April 21, 2019 11:26

Merge remote-tracking branch 'gs/cms_topn_core' into dev/fast_analyze

466428c

build cmsketch

53f9745

debug the calculation of ndv

d29d982

This was referenced Apr 22, 2019

executor: support fast sample for fast analyze #10214

Merged

executor: support building stats for fast analyze. #10258

Merged

lzmhhh123 closed this Apr 28, 2019

lzmhhh123 deleted the dev/fast_analyze branch July 24, 2019 05:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

executor: support fast analyze. #9973

executor: support fast analyze. #9973

lzmhhh123 commented Apr 1, 2019

codecov bot commented Apr 2, 2019 •

edited

Loading

lzmhhh123 commented Apr 4, 2019 •

edited

Loading

lzmhhh123 commented Apr 18, 2019 •

edited

Loading

lzmhhh123 commented Apr 18, 2019

winoros commented Apr 18, 2019

lzmhhh123 commented Apr 18, 2019

qw4990 Apr 18, 2019

qw4990 Apr 18, 2019

zz-jason Apr 18, 2019

zz-jason Apr 18, 2019

lzmhhh123 Apr 19, 2019

zz-jason Apr 19, 2019

lzmhhh123 commented Apr 28, 2019

executor: support fast analyze. #9973

executor: support fast analyze. #9973

Conversation

lzmhhh123 commented Apr 1, 2019

What problem does this PR solve?

What is changed and how it works?

Check List

codecov bot commented Apr 2, 2019 • edited Loading

Codecov Report

lzmhhh123 commented Apr 4, 2019 • edited Loading

lzmhhh123 commented Apr 18, 2019 • edited Loading

lzmhhh123 commented Apr 18, 2019

winoros commented Apr 18, 2019

lzmhhh123 commented Apr 18, 2019

qw4990 Apr 18, 2019

Choose a reason for hiding this comment

qw4990 Apr 18, 2019

Choose a reason for hiding this comment

zz-jason Apr 18, 2019

Choose a reason for hiding this comment

zz-jason Apr 18, 2019

Choose a reason for hiding this comment

lzmhhh123 Apr 19, 2019

Choose a reason for hiding this comment

zz-jason Apr 19, 2019

Choose a reason for hiding this comment

lzmhhh123 commented Apr 28, 2019

codecov bot commented Apr 2, 2019 •

edited

Loading

lzmhhh123 commented Apr 4, 2019 •

edited

Loading

lzmhhh123 commented Apr 18, 2019 •

edited

Loading