Introduce new sampling algorithm for statistics collecting #27357

winoros · 2021-08-18T19:05:34Z

Feature Request

Is your feature request related to a problem? Please describe:

The reservoir sampling collects too many wasted samples.
We're using the this one
We need to make sure that each sub-collector collects the same number of samples as the root one when we are in the distributed case.

So when we are collecting 10K samples. We need to collect 100K samples from each region. And each region has about 1 million rows by default option. This means that if we want to collect 10K samples for a table with 1 billion rows(the ideal sample rate here is 10^5/10^9=10^-4=0.01%), we actually collect 10^5/10^6 * 10^9=10^8 samples(the sample rate here is 10%).

You can see that 0.01% vs 10%. There's a huge waste.

Describe the feature you'd like:

We need a better sampling algorithm to not waste so much samples. It will increase the memory, CPU and network cost.

Describe alternatives you've considered:

Teachability, Documentation, Adoption, Migration Strategy:

winoros · 2023-12-22T17:29:57Z

implemented

winoros added the type/feature-request Categorizes issue or PR as related to a new feature. label Aug 18, 2021

winoros self-assigned this Aug 18, 2021

winoros added the sig/planner SIG: Planner label Aug 18, 2021

winoros mentioned this issue Aug 18, 2021

statistics: introduce sampling by rate #27359

Closed

12 tasks

winoros closed this as completed Dec 22, 2023

github-project-automation bot added this to Feature Request Kanban Aug 28, 2024

github-project-automation bot moved this to Finished in Feature Request Kanban Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce new sampling algorithm for statistics collecting #27357

Introduce new sampling algorithm for statistics collecting #27357

winoros commented Aug 18, 2021

winoros commented Dec 22, 2023

Introduce new sampling algorithm for statistics collecting #27357

Introduce new sampling algorithm for statistics collecting #27357

Comments

winoros commented Aug 18, 2021

Feature Request

winoros commented Dec 22, 2023