Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crdb_internal.statement_statistics returning wrong value #85958

Closed
maryliag opened this issue Aug 11, 2022 · 1 comment · Fixed by #85987
Closed

crdb_internal.statement_statistics returning wrong value #85958

maryliag opened this issue Aug 11, 2022 · 1 comment · Fixed by #85987
Assignees
Labels
A-sql-observability Related to observability of the SQL layer branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker

Comments

@maryliag
Copy link
Contributor

maryliag commented Aug 11, 2022

When there is the same fingerprint with more than one index recommendations the final result on the combined (memory and persisted) is double counting. Example:

CREATE TABLE t1 (k INT, i INT, f FLOAT, s STRING);
CREATE TABLE t2 (k INT, i INT, s STRING);
SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k WHERE t1.i > 3 AND t2.i > 3; (executed 6 times, flush and 3 more times)
> SELECT                                                                                                                          
aggregated_ts,                                                                                                                                     
fingerprint_id,                                                                                                                                    
transaction_fingerprint_id,                                                                                                                       
 plan_hash,                                                                                                                                        
 app_name,                                                                                                                                          
metadata,                                                                                                                                          
statistics,                                                                                                                                        
sampled_plan,                                                                                                                                      
aggregation_interval,                                                                                                                             
 index_recommendations                                                                                                                           
 FROM                                                                                                                                                
crdb_internal.cluster_statement_statistics WHERE fingerprint_id = '\x23f7891854867892';

-[ RECORD 1 ]
aggregated_ts              | 2022-08-10 20:00:00+00
fingerprint_id             | \x23f7891854867892
transaction_fingerprint_id | \x8c943454d287cf4d
plan_hash                  | \xf0d8ed4a1e2397f3
app_name                   | $ cockroach demo
metadata                   | {"db": "movr", "distsql": false, "failed": false, "fullScan": true, "implicitTxn": true, "query": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k WHERE (t1.i > _) AND (t2.i > _)", "querySummary": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 9.216E+4, "sqDiff": 0}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": ["creation : CREATE INDEX ON t1 (i) STORING (k);", "creation : CREATE INDEX ON t2 (i) STORING (k);"], 
"statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 3, "firstAttemptCnt": 3, "lastExecAt": "2022-08-10T20:17:04.247211Z", "maxRetries": 0, "nodes": [], "numRows": {"mean": 0, "sqDiff": 0}, "ovhLat": {"mean": 0.000005239000000000018, "sqDiff": 2.2394060000000434E-12}, "parseLat": {"mean": 0.00006279233333333333, "sqDiff": 2.724717286666665E-10}, "planGists": ["AgHgAQIAAwAAAAMB4gECAAMAAAADCQACAgAABQIGAg=="], "planLat": {"mean": 0.0002552356666666667, "sqDiff": 1.0722118544266666E-7}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "runLat": {"mean": 0.0004819166666666667, "sqDiff": 3.793698434666664E-9}, "svcLat": {"mean": 0.0008051836666666668, "sqDiff": 1.5446868870866666E-7}}}
sampled_plan               | {"Children": [{"Children": [{"Children": [], "Estimated Row Count": "1 (100% of the table; stats collected 12 minutes ago)", "Name": "scan", "Spans": "FULL SCAN", "Table": "t1@t1_pkey"}], "Estimated Row Count": "1", "Filter": "i > _", "Name": "filter"}, {"Children": [{"Children": [], "Estimated Row Count": "1 (100% of the table; stats collected 12 minutes ago)", "Name": "scan", "Spans": "FULL SCAN", "Table": "t2@t2_pkey"}], "Estimated Row Count": "1", "Filter": "i > _", "Name": "filter"}], "Equality": "(k) = (k)", "Estimated Row Count": "1", "Name": "hash join"}
aggregation_interval       | 01:00:00
index_recommendations      | {"creation : CREATE INDEX ON t1 (i) STORING (k);","creation : CREATE INDEX ON t2 (i) STORING (k);"}

This has statistics ->> 'cnt' = 3

Checking the same fingerprint on the persisted table

> SELECT                                                                                                                            
aggregated_ts,                                                                                                                                    
 fingerprint_id,                                                                                                                                    
transaction_fingerprint_id,                                                                                                                        
plan_hash,                                                                                                                                         
app_name,                                                                                                                                          
metadata,                                                                                                                                          
statistics,                                                                                                                                        
plan,                                                                                                                                              
agg_interval,                                                                                                                                      
index_recommendations                                                                                                                            
FROM                                                                                                                                                 
system.statement_statistics WHERE fingerprint_id = '\x23f7891854867892';
-[ RECORD 1 ]
aggregated_ts              | 2022-08-10 20:00:00+00
fingerprint_id             | \x23f7891854867892
transaction_fingerprint_id | \x8c943454d287cf4d
plan_hash                  | \xf0d8ed4a1e2397f3
app_name                   | $ cockroach demo
metadata                   | {"db": "movr", "distsql": true, "failed": false, "fullScan": true, "implicitTxn": true, "query": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k WHERE (t1.i > _) AND (t2.i > _)", "querySummary": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k", "stmtTyp": "TypeDML", "vec": true}
statistics                 | {"execution_statistics": {"cnt": 1, "contentionTime": {"mean": 0, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 9.216E+4, "sqDiff": 0}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": ["creation : CREATE INDEX ON t1 (i) STORING (k);", "creation : CREATE INDEX ON t2 (i) STORING (k);"], 
"statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 6, "firstAttemptCnt": 6, "lastExecAt": "2022-08-10T20:04:41.55328Z", "maxRetries": 0, "nodes": [], "numRows": {"mean": 0, "sqDiff": 0}, "ovhLat": {"mean": 0.000005479666666666607, "sqDiff": 1.0600553333329841E-12}, "parseLat": {"mean": 0.00007301849999999999, "sqDiff": 3.789622235000001E-10}, "planGists": ["AgHgAQIAAwAAAAMB4gECAAMAAAADCQACAgAABQIGAg=="], "planLat": {"mean": 0.0010753486666666666, "sqDiff": 0.000027087948344031332}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "runLat": {"mean": 0.0003941843333333333, "sqDiff": 5.4381690773333294E-9}, "svcLat": {"mean": 0.0015480311666666666, "sqDiff": 0.000026748804043984833}}}
plan                       | {"Children": [{"Children": [{"Children": [], "Missing Stats": "", "Name": "scan", "Spans": "FULL SCAN", "Table": "t1@t1_pkey"}], "Filter": "i > _", "Name": "filter"}, {"Children": [{"Children": [], "Missing Stats": "", "Name": "scan", "Spans": "FULL SCAN", "Table": "t2@t2_pkey"}], "Filter": "i > _", "Name": "filter"}], "Equality": "(k) = (k)", "Name": "hash join"}
agg_interval               | 01:00:00
index_recommendations      | {"creation : CREATE INDEX ON t1 (i) STORING (k);","creation : CREATE INDEX ON t2 (i) STORING (k);"}

It has statistics ->> 'cnt' = 6

BUT when I do a group by to join those two into one thing (this is the statement being used on crdb_internal.statement_statistics):

> SELECT
aggregated_ts,
fingerprint_id, 
transaction_fingerprint_id,   
plan_hash,
app_name, 
max(metadata) AS metadata,
crdb_internal.merge_statement_stats(array_agg(statistics)),  
max(sampled_plan), 
aggregation_interval,  
array_remove(array_agg(index_rec), NULL) AS index_recommendations 
  FROM   (  
SELECT  
aggregated_ts,   
fingerprint_id, 
transaction_fingerprint_id,
 plan_hash, 
app_name,  
metadata, 
statistics,  
sampled_plan,     
aggregation_interval, 
index_recommendations 
 FROM crdb_internal.cluster_statement_statistics WHERE fingerprint_id = '\x23f7891854867892' 
UNION ALL   
 SELECT 
aggregated_ts,           
 fingerprint_id,      
transaction_fingerprint_id,   
plan_hash,   
app_name, 
metadata,  
statistics,                                                                                                                                        
plan,                                                                                                                                             
 agg_interval,                                                                                                                                     
 index_recommendations                                                                                                                           
 FROM                                                                                                                                                 
system.statement_statistics WHERE fingerprint_id = '\x23f7891854867892'                                                                      
)                                                                                                                                                  
LEFT JOIN LATERAL unnest(index_recommendations) AS index_rec ON true                                                                             
GROUP BY                                                                                                                                             
aggregated_ts,                                                                                                                                     
fingerprint_id,                                                                                                                                    
transaction_fingerprint_id,                                                                                                                        
plan_hash,                                                                                                                                        
 app_name,                                                                                                                                          
aggregation_interval;
-[ RECORD 1 ]
aggregated_ts                       | 2022-08-10 20:00:00+00
fingerprint_id                      | \x23f7891854867892
transaction_fingerprint_id          | \x8c943454d287cf4d
plan_hash                           | \xf0d8ed4a1e2397f3
app_name                            | $ cockroach demo
metadata                            | {"db": "movr", "distsql": true, "failed": false, "fullScan": true, "implicitTxn": true, "query": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k WHERE (t1.i > _) AND (t2.i > _)", "querySummary": "SELECT t1.k FROM t1 JOIN t2 ON t1.k = t2.k", "stmtTyp": "TypeDML", "vec": true}
crdb_internal.merge_statement_stats | {"execution_statistics": {"cnt": 4, "contentionTime": {"mean": 0, "sqDiff": 0}, "maxDiskUsage": {"mean": 0, "sqDiff": 0}, "maxMemUsage": {"mean": 9.216E+4, "sqDiff": 0}, "networkBytes": {"mean": 0, "sqDiff": 0}, "networkMsgs": {"mean": 0, "sqDiff": 0}}, "index_recommendations": ["creation : CREATE INDEX ON t1 (i) STORING (k);", "creation : CREATE INDEX ON t2 (i) STORING (k);"], 
"statistics": {"bytesRead": {"mean": 0, "sqDiff": 0}, "cnt": 18, "firstAttemptCnt": 18, "lastExecAt": "2022-08-10T20:17:04.247211Z", "maxRetries": 0, "nodes": [], "numRows": {"mean": 0, "sqDiff": 0}, "ovhLat": {"mean": 0.0000053994444444444104, "sqDiff": 6.830604444443682E-12}, "parseLat": {"mean": 0.00006960977777777777, "sqDiff": 1.721165843111111E-9}, "planGists": ["AgHgAQIAAwAAAAMB4gECAAMAAAADCQACAgAABQIGAg=="], "planLat": {"mean": 0.0008019776666666666, "sqDiff": 0.000057080680390024}, "rowsRead": {"mean": 0, "sqDiff": 0}, "rowsWritten": {"mean": 0, "sqDiff": 0}, "runLat": {"mean": 0.00042342844444444446, "sqDiff": 4.9251584272444445E-8}, "svcLat": {"mean": 0.0013004153333333332, "sqDiff": 0.000056013835098412006}}}
max                                 | {"Children": [{"Children": [{"Children": [], "Missing Stats": "", "Name": "scan", "Spans": "FULL SCAN", "Table": "t1@t1_pkey"}], "Filter": "i > _", "Name": "filter"}, {"Children": [{"Children": [], "Missing Stats": "", "Name": "scan", "Spans": "FULL SCAN", "Table": "t2@t2_pkey"}], "Filter": "i > _", "Name": "filter"}], "Equality": "(k) = (k)", "Name": "hash join"}
aggregation_interval                | 01:00:00
index_recommendations               | {"creation : CREATE INDEX ON t1 (i) STORING (k);","creation : CREATE INDEX ON t2 (i) STORING (k);"}

I’m getting statistics ->> 'cnt' = 18 instead of the expected 9.

We need a different join that doesn't use the LEFT JOIN LATERAL unnest(index_recommendations) AS index_rec ON true because is joining on itself and double counting.

Jira issue: CRDB-18505

@maryliag maryliag added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-observability Related to observability of the SQL layer GA-blocker labels Aug 11, 2022
@maryliag maryliag self-assigned this Aug 11, 2022
@blathers-crl
Copy link

blathers-crl bot commented Aug 11, 2022

Hi @maryliag, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@maryliag maryliag added the branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 label Aug 11, 2022
maryliag added a commit to maryliag/cockroach that referenced this issue Aug 11, 2022
Previously, because we were using a join, we were double
counting statistics when we had the same fingerprint in
memory and persisted.
This commit adds a `DISTINCT` so we only count them once.

Fixes cockroachdb#85958

Release note: None
craig bot pushed a commit that referenced this issue Aug 12, 2022
85850: importer: support {} format for arrays in CSV r=ecwall a=rafiss

fixes #84631

Release note (sql change): Arrays can now be imported in a CSV file
using the {} format, similar to COPY FROM. Importing array expressions
(e.g. ARRAY[1, 2, 3]) is still supported as well.

85854: clusterversion,storage: remove 22.1 PebbleFormat version gates r=jbowens a=celiala

This commit removes the following 22.1 version gates:

- PebbleFormatBlockPropertyCollector
- PebbleFormatSplitUserKeysMarked

Cleanup was done following guidance from [21.2 cleanup](#74270 (comment)):

> For the most part, if the gates were just simple if !version.IsActive { return x } or something, I just removed the block, and even if it was a little more complicated, like args = [x]; if version { args = append(args, y) }; foo(args) I still tried to mostly inline it such that it looked natural (i.e. remove that append and make it args = [x, y]).

> However for just a couple more complicated cases that were referring to <21.2 versions that needed to be replaced when those were deleted, I added a placeholder clusterversion.TODOPre21_2 alias for 21.2. Replacing those calls with this alias shouldn't change their behavior -- it was already always true, since the code today should never run in a <21.2 cluster -- but means we can delete those older versions in the meantime and then the owners of these bits can decide how to update them.

Partially addresses #80663

Release note: none

85909: kvserver: instrument RaftTransport workers with pprof labels r=tbg,erikgrinaker a=pavelkalinnikov

The unused arguments in the method signature were used to identify goroutines
in traces. This no longer works after Go 1.17 started passing arguments via
registers.

This commit adds pprof labels when starting these goroutines, to have a cleaner
code, more readable traces, and to work around the new Go convention.

Release note: None

85977: grunning: add library for precise on-CPU time measurement r=irfansharif a=irfansharif

Package grunning is a library that's able to retrieve on-CPU running
time for individual goroutines. It relies on using a patched Go and
provides a primitive for fine-grained CPU attribution and control
through a single API:

    package grunning

    // Time returns the time spent by the current goroutine in the
    // running state.
    func Time() time.Duration

The motivating RFC is over at #82356. Informs #82625.
We build CRDB using use the patched Go runtime for all officially
supported platforms when built using Bazel (#84867). Engineers commonly
building CRDB also use happen to use two platforms we don't use a
patched Go for:
- FreeBSD (we don't have cross-compilers setup), and
- M1/M2 Macs (we don't have a code-signing pipeline, yet).
We use '(darwin && arm64) || freebsd || !bazel' as the build tag to
exclude such platforms. See #84867 for more details.

This package tests various properties we should expect over the running time
value. It does not make assertions given the CI environments we run these
under (CPU-starved, lot of OS thread pre-emption, dissimilar to healthy
CRDB deployments). This is also why they're skipped under stress. Still,
these tests are useful to understand the properties we expect running
time to have:

    === RUN   TestEquivalentGoroutines
        thread=03 expected≈10.00% got= 9.98% of on-cpu time
        thread=06 expected≈10.00% got=10.00% of on-cpu time
        thread=02 expected≈10.00% got=10.01% of on-cpu time
        thread=10 expected≈10.00% got=10.01% of on-cpu time
        thread=07 expected≈10.00% got= 9.99% of on-cpu time
        thread=04 expected≈10.00% got= 9.99% of on-cpu time
        thread=09 expected≈10.00% got=10.00% of on-cpu time
        thread=01 expected≈10.00% got= 9.99% of on-cpu time
        thread=08 expected≈10.00% got=10.02% of on-cpu time
        thread=05 expected≈10.00% got=10.02% of on-cpu time
    --- PASS: TestEquivalentGoroutines (0.56s)

    === RUN   TestProportionalGoroutines
        thread=01 got  1.82% of on-cpu time: expected≈ 1.00x got=1.00x
        thread=02 got  3.64% of on-cpu time: expected≈ 2.00x got=2.00x
        thread=03 got  5.47% of on-cpu time: expected≈ 3.00x got=3.00x
        thread=04 got  7.28% of on-cpu time: expected≈ 4.00x got=4.00x
        thread=05 got  9.09% of on-cpu time: expected≈ 5.00x got=4.99x
        thread=06 got 10.91% of on-cpu time: expected≈ 6.00x got=5.99x
        thread=07 got 12.73% of on-cpu time: expected≈ 7.00x got=6.99x
        thread=08 got 14.54% of on-cpu time: expected≈ 8.00x got=7.99x
        thread=09 got 16.36% of on-cpu time: expected≈ 9.00x got=8.99x
        thread=10 got 18.16% of on-cpu time: expected≈10.00x got=9.97x
    --- PASS: TestProportionalGoroutines (1.72s)

    === RUN   TestPingPongHog
        pinger/ponger expected≈1.00x got=0.96x
    --- PASS: TestPingPongHog (0.91s)

Release note: None

85987: sql: fix aggregation of statistics r=maryliag a=maryliag

Previously, because we were using a join, we were double
counting statistics when we had the same fingerprint in
memory and persisted.
This commit adds a `DISTINCT` so we only count them once.

Fixes #85958

Release note: None

86008: sql: logic test to inform maximum builtin function oid change r=chengxiong-ruan a=chengxiong-ruan

This commit adds a logic test to let engineers who added a
new builtin function know that the new builtin function is
constructed earlier than some existing builtin functions at
init time.

Release note: None

Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Celia La <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Chengxiong Ruan <[email protected]>
@craig craig bot closed this as completed in de478e1 Aug 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-observability Related to observability of the SQL layer branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant