Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Testing for pg_stats histogram bounds #17378

Open
1 task done
tanujnay112 opened this issue May 16, 2023 · 0 comments
Open
1 task done

[YSQL] Testing for pg_stats histogram bounds #17378

tanujnay112 opened this issue May 16, 2023 · 0 comments
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@tanujnay112
Copy link
Contributor

tanujnay112 commented May 16, 2023

Jira Link: DB-6566

Description

There is currently no formal testing for the histogram_bounds found in pg_stats. This issue tracks this effort.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@tanujnay112 tanujnay112 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels May 16, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue and removed status/awaiting-triage Issue awaiting triage labels May 16, 2023
tanujnay112 added a commit that referenced this issue Sep 23, 2024
Summary:
This change introduces testing for histogram_bounds in pg_stats. The error metric used to test the goodness of a histogram is based on the max relative error found in "Random sampling for histogram construction: how much is enough?" by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya. PG cites this paper to show its histogram error bounds so it was suitable to use the same paper for the error metric.
More specifically, the error delta of a histogram is defined as the maximum difference between its bucket sizes and the "perfect" bucket size of `n/k` where `n` is the total number of rows and `k` is the number of buckets. The relative error is defined as delta/(perfect bucket size) which PG expects to be `<= 0.5` on columns with only distinct values. PG overlooks non distinct columns when discussing this error bound for which an adjustment had to be made in this change. Instead of expecting the max error, delta, to be less than `0.5*(perfect bucket size)` we expect it to be less than `0.5*(perfect bucket size) + max(multiplicity(s_i) - 1)` where `s_i` denotes the boundaries of the histogram we are testing. This reduces to the original error bound when dealing with a column that only has distinct values.
Jira: DB-6566

Test Plan:
Jenkins: test regex: .*TestPgAnalyze.*
./yb_build.sh --java-test 'org.yb.pgsql.TestPgAnalyze'

The above test fails before the fix in 36198ae

Reviewers: amartsinchyk, mtakahara

Reviewed By: mtakahara

Subscribers: yql

Differential Revision: https://phorge.dev.yugabyte.com/D25582
foucher pushed a commit that referenced this issue Sep 24, 2024
Summary:
 5d3e83e [PLAT-15199] Change TP API URLs according to latest refactoring
 a50a730 [doc][yba] YBDB compatibility (#23984)
 0c84dbe [#24029] Update the callhome diagnostics  not to send gflags details.
 b53ed3a [PLAT-15379][Fix PLAT-12510] Option to use UTC when dealing with cron exp. in backup schedule
 f0eab8f [PLAT-15278]: Fix DB Scoped XCluster replication restart
 344bc76 Revert "[PLAT-15379][Fix PLAT-12510] Option to use UTC when dealing with cron exp. in backup schedule"
 3628ba7 [PLAT-14459] Swagger fix
 bb93ebe [#24021] YSQL: Add --TEST_check_catalog_version_overflow
 9ab7806 [#23927] docdb: Add gflag for minimum thread stack size
 Excluded: 8c8adc0 [#18822] YSQL: Gate update optimizations behind preview flag
 5e86515 [#23768] YSQL: Fix table rewrite DDL before slot creation
 123d496 [PLAT-14682] Universe task should only unlock itself and make unlock aware of the lock config
 de9d4ad [doc][yba] CIS hardened OS support (#23789)
 e131b20 [#23998] DocDB: Update usearch and other header-only third-party dependencies
 1665662 Automatic commit by thirdparty_tool: update usearch to commit 240fe9c298100f9e37a2d7377b1595be6ba1f412.
 3adbdae Automatic commit by thirdparty_tool: update fp16 to commit 98b0a46bce017382a6351a19577ec43a715b6835.
 9a819f7 Automatic commit by thirdparty_tool: update hnswlib to commit 2142dc6f4dd08e64ab727a7bbd93be7f732e80b0.
 2dc58f4 Automatic commit by thirdparty_tool: update simsimd to tag v5.1.0.
 9a03432 [doc][ybm] Azure private link host (#24086)
 039c9a2 [#17378] YSQL: Testing for histogram_bounds in pg_stats
 09f7a0f [#24085] DocDB: Refactor HNSW wrappers
 555af7d [#24000] DocDB: Shutting down shared exchange could cause TServer to hang
 5743a03 [PLAT-15317]Alert emails are not in the correct format.
 8642555 [PLAT-15379][Fix PLAT-12510] Option to use UTC when dealing with cron exp. in backup schedule
 253ab07 [PLAT-15400][PLAT-15401][PLAT-13051] - Connection pooling ui issues and other ui issues
 57576ae [#16487] YSQL: Fix flakey TestPostgresPid test
 bc8ae45 Update ports for CIS hardened (#24098)
 6fa33e6 [#18152, #18729] Docdb: Fix test TestPgIndexSelectiveUpdate
 cc6d2d1 [docs] added and updated cves (#24046)
 Excluded: ed153dc [#24055] YSQL: fix pg_hint_plan regression with executing prepared statement

Test Plan: Jenkins: rebase: pg15-cherrypicks

Reviewers: jason, jenkins-bot

Differential Revision: https://phorge.dev.yugabyte.com/D38322
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

2 participants