Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

executor: support building stats for fast analyze. #10258

Merged
merged 16 commits into from
Apr 28, 2019

Conversation

lzmhhh123
Copy link
Contributor

What problem does this PR solve?

After the fast sample, we need to build stats base on the samples.

What is changed and how it works?

Calculate stats by samples.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change

Side effects

  • None

Related changes

@winoros
Copy link
Member

winoros commented Apr 24, 2019

@lzmhhh123 Don't forget to add label for the pull request.

statistics/builder.go Outdated Show resolved Hide resolved
statistics/sample.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
statistics/builder.go Outdated Show resolved Hide resolved
@zz-jason
Copy link
Member

please fix ci

executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
statistics/builder.go Outdated Show resolved Hide resolved
statistics/builder.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
statistics/sample.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
executor/analyze.go Show resolved Hide resolved
rowCount := mathutil.MinInt64(domain.GetDomain(e.ctx).StatsHandle().GetTableStats(e.tblInfo).Count, int64(e.rowCount))
// build CMSketch
var ndv, scaleRatio uint64
collector.CMSketch, ndv, scaleRatio = statistics.NewCMSketchWithTopN(defaultCMSketchDepth, defaultCMSketchWidth, data, uint32(len(data)), uint64(rowCount))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use len(data) as topN?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea about the number of topN. So just keep the len(data) here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erjiaqing PTAL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use a relative small number, maybe 20 or 40.
Finding element in TopN is not cheap, we should keep size of TopN relative small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a constant ok? Or can we calculate it by the sample size?

statistics/builder.go Show resolved Hide resolved
@zhouqiang-cl
Copy link
Contributor

/rebuild

6 similar comments
@zhouqiang-cl
Copy link
Contributor

/rebuild

@zhouqiang-cl
Copy link
Contributor

/rebuild

@zhouqiang-cl
Copy link
Contributor

/rebuild

@zhouqiang-cl
Copy link
Contributor

/rebuild

@zhouqiang-cl
Copy link
Contributor

/rebuild

@zhouqiang-cl
Copy link
Contributor

/rebuild

@zhouqiang-cl
Copy link
Contributor

/rebuild

1 similar comment
@zhouqiang-cl
Copy link
Contributor

/rebuild

executor/analyze.go Outdated Show resolved Hide resolved
statistics/sample.go Outdated Show resolved Hide resolved
executor/analyze.go Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Apr 26, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@45c0e51). Click here to learn what that means.
The diff coverage is 76.6666%.

@@             Coverage Diff             @@
##             master     #10258   +/-   ##
===========================================
  Coverage          ?   77.8228%           
===========================================
  Files             ?        410           
  Lines             ?      84578           
  Branches          ?          0           
===========================================
  Hits              ?      65821           
  Misses            ?      13842           
  Partials          ?       4915

@lzmhhh123
Copy link
Contributor Author

lzmhhh123 commented Apr 26, 2019

All comments have been addressed. Pls take a look @zz-jason @lamxTyler @eurekaka .

executor/analyze.go Outdated Show resolved Hide resolved
Copy link
Contributor

@alivxxx alivxxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alivxxx alivxxx added the status/LGT1 Indicates that a PR has LGTM 1. label Apr 26, 2019
@zhouqiang-cl
Copy link
Contributor

/rebuild

1 similar comment
@zhouqiang-cl
Copy link
Contributor

/rebuild

@mahjonp
Copy link
Contributor

mahjonp commented Apr 26, 2019

/rebuild

1 similar comment
@mahjonp
Copy link
Contributor

mahjonp commented Apr 26, 2019

/rebuild

@zz-jason zz-jason added priority/P1 The issue has P1 priority. priority/release-blocker This issue blocks a release. Please solve it ASAP. labels Apr 28, 2019
}
// build CMSketch
var ndv, scaleRatio uint64
collector.CMSketch, ndv, scaleRatio = statistics.NewCMSketchWithTopN(defaultCMSketchDepth, defaultCMSketchWidth, data, 20, uint64(rowCount))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make 20 a constant variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. We keep it constant now. And modify it in the future.

Copy link
Contributor

@eurekaka eurekaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eurekaka eurekaka added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Apr 28, 2019
@eurekaka
Copy link
Contributor

/run-all-tests

@zz-jason zz-jason merged commit 5fa16a8 into pingcap:master Apr 28, 2019
@lzmhhh123 lzmhhh123 deleted the dev/fast_analyze_build_stats branch July 24, 2019 05:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/statistics priority/P1 The issue has P1 priority. priority/release-blocker This issue blocks a release. Please solve it ASAP. sig/execution SIG execution status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants