Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/opt/exec/execbuilder: TestExecBuild failed #74933

Closed
cockroach-teamcity opened this issue Jan 18, 2022 · 16 comments · Fixed by #75304
Closed

sql/opt/exec/execbuilder: TestExecBuild failed #74933

cockroach-teamcity opened this issue Jan 18, 2022 · 16 comments · Fixed by #75304
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team

Comments

@cockroach-teamcity
Copy link
Member

sql/opt/exec/execbuilder.TestExecBuild failed with artifacts on master @ 365b4da8bd02c06ee59d2130a56dec74ffc9ce21:

            • sort
            │ order: +k
            │
            └── • filter
                │ filter: st_intersects('01040000000200000001010000009A999999999901409A99999999990140010100000000000000000008400000000000000840', geom)
                │
                └── • index join
                    │ table: geo_table@geo_table_pkey
                    │
                    └── • inverted filter
                        │ inverted column: geom_inverted_key
                        │ num spans: 31
                        │
                        └── • scan
                              missing stats
                              table: geo_table@geom_index
                              spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUlFFP2zwUhu-_X2GdG1p9prWdlA1fFUbYMpWWpZ02NFcoa85YRLEz2506of73KQ0DAmogvnB7jv2ck_e17Ftwv5YgIfp6PjqKx6RzEk9n00-jLplGo-jdjFyT02RyRq7QXPr0-xLJlw9REhHnL3Pt0TpceNfZO_s8msXnk3g863RETxDRE11KOkGPkaDHut09Kd9Hk7NollzQstZNl0ySkyghxxfkGihok-E4vUEH8htwoCBgTqGwZoHOGVumb7eb4mwNklHIdbHyZXpOYWEsgrwFn_slgoSx2TdFfwAUMvRpvtxu21AwK_8AOZ9eIciDDX1UmDcXnpUGJJhmaPusVh7u_RmW6i5zneEaKEyLVDtJ-gqOlVr_yJRac6bUmr00wX5bhisgqc5IwIjxP9E62KWZt9Ec699oPWan-dKjRdvndeH_1qN1YYnRZMglcaVq4nxqvdyqCN4MlGKCKcXYSxMQ1FlbrBT_RD2FycpLMuQ7fRBtfPhocn139GLX0Rc2v0ntn4fWdCh2dg_adL93P6j3rvLy6XVknIWsGuLulzNe_Tk8OrwfjIfP4oedtfE2fBbvyce3eii6r_A8bKN6aqxH2w_rmof8_53lB7XyL7wTCbrCaIeveijYZk4Bsyus3iJnVnaB59Ystm2qcLLltokMna9WD6og1tVS-YGPYd4Ii2ZYNMJBMxw0wmEzHDbCg2Z40AizJ_B889_fAAAA___nMRL3
            
        but found (query options: "") :
            distribution: local
            vectorized: true
            ·
            • sort
            │ order: +k
            │
            └── • filter
                │ filter: st_intersects('01040000000200000001010000009A999999999901409A99999999990140010100000000000000000008400000000000000840', geom)
                │
                └── • index join
                    │ table: geo_table@geo_table_pkey
                    │
                    └── • inverted filter
                        │ inverted column: geom_inverted_key
                        │ num spans: 31
                        │
                        └── • scan
                              missing stats
                              table: geo_table@geom_index
                              spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUlFFP2zAQx9_3Kax7odW81naCNvxUGGHLVChLO21orlDW3FhEsTPbnTKhfvcpCQNa1ED84PbO_v3P_7OcW3C_lyAh-nY-PozPSO84ns6mn8d9Mo3G0fsZuSYnyeSUXKG59OmPJZKvH6MkIs5f5tqjdbjwrrd3-mU8i88n8dms1xMDQcRA9CnpBQNGggHr9_ek_BBNTqNZckErrZs-mSTHUUKOLsg1UNAmw7P0Bh3I78BhTqGwZoHOGVulbusNcVaCZBRyXax8lZ5TWBiLIG_B536JIGFWnTHBNEM7ZEAhQ5_my1r23sKoOsBlrjMsgcK0SLWTZKjgSKnyZ6ZUyZlSJXtugjddGa6ApDojASPG_0LrYL6mYFb-wY_z6RWC5Gv6cs-x_oPWY3aSLz1atEO-afz_elQWlhhNRlwSV7kmzqfWy9pF8HZfKSaYUow9NwFBnXXFKvNb7ilMVl6SEd_ZB9GlD59Mru-uXuy6-sLmN6n9-1CajsTO6kGX6vfdDzZrN3m5_WIYZyFrhrj75Yw3fw4OD-4H4-GT-GHnxngXPon35OOHNxL9F_Q87OJ6aqxHOww3PY_4653y-13kE3SF0Q435Hcps_WcAmZX2HwynFnZBZ5bs6jLNOGk5upEhs43q7wJYt0sVQd8DPNWWLTDohUO2uGgFQ7b4bAV3t-C5-tX_wIAAP__VL3gzA==
--- done: testdata/inverted_filter_geospatial_dist with config 5node: 13 tests, 1 failures
    logic.go:3370: 
        testdata/inverted_filter_geospatial_dist:206: error while processing
    logic.go:3370: testdata/inverted_filter_geospatial_dist:206: too many errors encountered, skipping the rest of the input
        --- FAIL: TestExecBuild/5node/inverted_filter_geospatial_dist (11.11s)
=== RUN   TestExecBuild/5node
    --- FAIL: TestExecBuild/5node (0.00s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

/cc @cockroachdb/sql-queries

This test on roachdash | Improve this report!

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Jan 18, 2022
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jan 18, 2022
@cockroach-teamcity
Copy link
Member Author

sql/opt/exec/execbuilder.TestExecBuild failed with artifacts on master @ 365b4da8bd02c06ee59d2130a56dec74ffc9ce21:

            vectorized: true
            ·
            • filter
            │ filter: 'BOX(3 3,3 3)' && geom
            │
            └── • index join
                │ table: geo_table@geo_table_pkey
                │
                └── • inverted filter
                    │ inverted column: geom_inverted_key
                    │ num spans: 31
                    │
                    └── • scan
                          missing stats
                          table: geo_table@geom_index
                          spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUk9Fv2j4Qx99_f8XpXmglV9gJ_Db5ibGmGhODDpCGNKMqwzcWDezMNlMmxP8-Jdna0o5Q_HDS3flz5-9Zt0P_Y40Sk_nt8M1gBBfXg-ls-nF4CdNkmLydwXe4mYw_wIrsXUi_rAk-vUsmCbS44II_P687z_yWlP3xPLoGteU8-r-2ZcENMjRW0yjdkEf5GQUyjHDBMHd2Sd5bV4Z31aWBLlByhpnJt6EMLxgurSOUOwxZWBNKHNkrm7c7yFBTSLN1dW3P0G7DA-RDuiKU3T17VFg0F56VyieUanJtflAe7wfTKxXdZUZTgQyneWq8hLbCvlLFV61UIbhSBT9l8OpcRiiE1GiIOdjwjZzHY5rFOZoH5ie5QPomWwdy5NriUPjffFLkDqyBnpDgS9XgQ-qCrFTEr7pK8Ygrxfkpg0BGn4uV4p-oZzjeBgk9cXQO0TlzeG8z8-fro2Nfn7tsk7pfD61ZLzraPT6n-_3048PedVxCqz-eX8QQsxjiy5b856r1ohfMpHPwqhOLNiGfW-PpRZvG9wuGpFdUL7O3W7ekW2eXVZvaHVdcFdDkQ53t1s7A1KnygY9h0QhHzXDUCMfNcNwId5rhTiPMn8CL_X-_AwAA__8rRsTS
            
        but found (query options: "") :
            distribution: local
            vectorized: true
            ·
            • filter
            │ filter: 'BOX(3 3,3 3)' && geom
            │
            └── • index join
                │ table: geo_table@geo_table_pkey
                │
                └── • inverted filter
                    │ inverted column: geom_inverted_key
                    │ num spans: 31
                    │
                    └── • scan
                          missing stats
                          table: geo_table@geom_index
                          spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUk99v0zAQx9_5K0730k0yqh2XH_JTKctEUVlHW4lKuJpCfZSI1g62i4Kq_u8oCWx0o92SB0t3l89973uWdxh-rFFhOr8evRlewdnFcDqbfhydwzQdpW9n8B0uJ-MPsCJ3E7Mva4JP79JJCh0uuOAPv9e9B3FHqcF4nlyA3nKevGzOquEGGVpn6CrbUED1GQUuGBbeLSkE56vUrv5haEpUnGFui22s0guGS-cJ1Q5jHteECmfVcBPKDPkuR4aGYpav67a3s_cr0ZvcGiqR4bTIbFDQ1TjQuvxqtC4F17rkjx34vC0jNEJmDUgOLn4jH3CxZ-i28c5PiNmKUIk9e7rnof1JPpK5zNeRPPmuODT-t56WhQdnoS8UhMo1hJj5qGoX8tULrXnCteb8sQOBrGmLVebvuWc43kYFfXF0D0mbPbx3uf1z9cmxqy98vsn8rztp1k-Oqss26rfbl4faTV5BZzCen0mQTII876j_voZ-8oSd9NpMNaFQOBvoYKZjnfl-wZDMipo3F9zWL-nau2Ut04TjmqsThkJsqqIJhrYpVQP-C4uTcHIaTk7C8jQsT8K9e_Bi_-x3AAAA__81JZK7
--- done: testdata/inverted_filter_geospatial_dist with config 5node: 17 tests, 1 failures
    logic.go:3370: 
        testdata/inverted_filter_geospatial_dist:286: error while processing
    logic.go:3370: testdata/inverted_filter_geospatial_dist:286: too many errors encountered, skipping the rest of the input
--- done: testdata/inverted_filter_json_array_dist with config 5node: 30 tests, 0 failures
--- done: testdata/explain_analyze_plans with config 5node: 20 tests, 0 failures
--- total progress: 3525 statements/queries
--- total: 3525 tests, 0 failures
        --- FAIL: TestExecBuild/5node/inverted_filter_geospatial_dist (2.86s)
=== RUN   TestExecBuild/5node
    --- FAIL: TestExecBuild/5node (0.00s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-parallel=4

This test on roachdash | Improve this report!

@rytaft
Copy link
Collaborator

rytaft commented Jan 18, 2022

The one thing that's different between the expected and actual is cut off above. Expected is distribution: full, while actual is distribution: local.

@rytaft
Copy link
Collaborator

rytaft commented Jan 18, 2022

Both failures above occurred in inverted_filter_geospatial_dist, which has configuration # LogicTest: 5node

@cucaroach cucaroach self-assigned this Jan 18, 2022
@cucaroach
Copy link
Contributor

reproduced pretty readily like this:

❯ make stress PKG=github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder TESTTIMEOUT=40m0s GOFLAGS=-parallel=4 TAGS= 'STRESSFLAGS=-stderr -maxruns 100 -maxtime 1h0m0s -maxfails 1 -p 4'

Trying to run just one w/o stress using the correct rand seed doesn't though. Something stress specific about it?

@cockroach-teamcity
Copy link
Member Author

sql/opt/exec/execbuilder.TestExecBuild failed with artifacts on master @ 5aefc070d1e7f5bfd860eae24dea2e7726dc6d8c:

            • sort
            │ order: +k
            │
            └── • filter
                │ filter: st_coveredby('01040000000200000001010000009A999999999901409A99999999990140010100000000000000000008400000000000000840', geom)
                │
                └── • index join
                    │ table: geo_table@geo_table_pkey
                    │
                    └── • inverted filter
                        │ inverted column: geom_inverted_key
                        │ num spans: 31
                        │
                        └── • scan
                              missing stats
                              table: geo_table@geom_index
                              spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUlFFP2zAQx9_3Kax7odVMazspG34qjLBlKi1LO21orlCobyyixJntoiLU7z6lYUBBDcQPbu_s3__yP8u-A_d3DhKin6eDg3hIWkfxeDL-NmiTcTSIPk3IFTlORifkEs25Ty_mSH58iZKIOH8-MzdoUV_ctnZOvg8m8ekoHk5aLdERRHREm5JW0GEk6LB2e0fKz9HoJJokZ7SUum6TUXIUJeTwjFwBhdxoHKbX6ED-Ag4UBEwpFNbM0Dljy_TdelOslyAZhSwvFr5MTynMjEWQd-AzP0eQMDS7puj2gIJGn2bz9bYVBbPwj5Dz6SWC3FvRJ8K8XnhS-k8w1Wi7bEMeHtrTL92dZ7nGJVAYF2nuJOkqOFRq-VsrteRMqSV7bYLdpgxXQNJck4AR4_-gdbDNM2_iOc5v0HrUx9nco0Xb5ZvG_69Hy8ISk5M-l8SVronzqfVy7SL40FOKCaYUY69NQDDXTbHS_DP3FEYLL0mfb-2DaNKHrybL749ebDv6wmbXqb19LE37Ymv1oEn1h-4Hm7WrvHx2GxlnIauGuP_ljFd_9g_2Hwbj4Yv4cefG-Bi-iHfk00vdF-03tDxsYnpsrEfbDTct9_n7rfK9DflXnokEXWFyh296J9hqSgH1JVZPkTMLO8NTa2brMlU4WnPrhEbnq9W9Kojzaqn8wKcwr4VFPSxq4aAeDmrhsB4Oa-FePdyrhdkzeLp69y8AAP__-8wR9Q==
            
        but found (query options: "") :
            distribution: local
            vectorized: true
            ·
            • sort
            │ order: +k
            │
            └── • filter
                │ filter: st_coveredby('01040000000200000001010000009A999999999901409A99999999990140010100000000000000000008400000000000000840', geom)
                │
                └── • index join
                    │ table: geo_table@geo_table_pkey
                    │
                    └── • inverted filter
                        │ inverted column: geom_inverted_key
                        │ num spans: 31
                        │
                        └── • scan
                              missing stats
                              table: geo_table@geom_index
                              spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUlFFP2zwUhu-_X2GdG1p9Xms7QRu-KoywZSqUpZ02NFco1GcsosSZ7aIg1P8-JWFAQQ3EF27PsZ_3-D2WcwfuzxIkRD9Ox_vxCekdxtPZ9Ou4T6bROPo4I1fkKJkck0s05z69WCL5_jlKIuL8-cLcoEV9cdvbOf42nsWnk_hk1uuJgSBiIPqU9IIBI8GA9fs7Un6KJsfRLDmjldR1n0ySwyghB2fkCijkRuNJeo0O5E_gMKdQWLNA54ytUnf1hliXIBmFLC9WvkrPKSyMRZB34DO_RJAwq46YYKrRDhlQ0OjTbFnLPjgYVQc4z3KNJVCYFmnuJBkqOFCq_KWVKjlTqmSvTfCuK8MVkDTXJGDE-N9oHczXFMzKP_pxPr1EkHxN3-45zm_QetRH2dKjRTvkm8b_rUdlYYnJyYhL4irXxPnUelm7CN7vKsUEU4qx1yYgmOuuWGX-mXsKk5WXZMS39kF06cMXk-X3Vy-2XX1hs-vU3j6WpiOxtXrQpfpD94PN2k1ePnswjLOQNUPc_3LGmz97-3sPg_HwRfy4c2N8CF_EO_LpuxuJ_htaHnYxPTXWox2Gm5ZH_P-t8rtd5BN0hckdbshvU2brOQXUl9h8MZxZ2QWeWrOoyzThpObqhEbnm1XeBHHeLFUHfArzVli0w6IVDtrhoBUO2-GwFd59Bs_X__0NAAD___MV38o=
--- done: testdata/inverted_filter_geospatial_dist with config 5node: 14 tests, 1 failures
    logic.go:3370: 
        testdata/inverted_filter_geospatial_dist:234: error while processing
    logic.go:3370: testdata/inverted_filter_geospatial_dist:234: too many errors encountered, skipping the rest of the input
        --- FAIL: TestExecBuild/5node/inverted_filter_geospatial_dist (9.01s)
=== RUN   TestExecBuild/5node
    --- FAIL: TestExecBuild/5node (0.00s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-json

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql/opt/exec/execbuilder.TestExecBuild failed with artifacts on master @ d1eafdf56b42ca09a169c199916917a144c2c412:

        testdata/inverted_filter_geospatial_dist:238: EXPLAIN (DISTSQL)
        SELECT k FROM geo_table WHERE geom && 'POINT(3.0 3.0)'::geometry
        expected:
            distribution: full
            vectorized: true
            ·
            • filter
            │ filter: geom && '010100000000000000000008400000000000000840'
            │
            └── • index join
                │ table: geo_table@geo_table_pkey
                │
                └── • inverted filter
                    │ inverted column: geom_inverted_key
                    │ num spans: 31
                    │
                    └── • scan
                          missing stats
                          table: geo_table@geom_index
                          spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUk29v0z4Qx5__XoV1T7pJ3mon7Q_kR-VPBkFdW9pKgHA1hfooEa0dbAcFVX3vKAlsS6emxA9Oujt_7vw96_bgfmxBQPRxNn4RT8jF63ixXLwfX5JFNI5eLcl3cjOf3pINmjuffNki-fA2mkelvyMyZyz4v7akN5vGk-VFeM1IeM0ue0K8iaa30XL-CShoo3CS7NCB-AwcKASwopBZs0bnjC3D--pSrAoQjEKqs9yX4RWFtbEIYg8-9VsEARNzZbL-ACgo9Em6ra4dKJjcP0DOJxsEMTzQR4V5e-FlKXCOiULbZ43ycK9_VCq_S7XCAigsskQ7QfoSXkpZfFVSFpxJWbBzBq66MlwCSbQiISPGf0Pr4JRm3kVzrH-i9ahu0q1Hi7bPm8L_5qMis8RoMuKCuFI1cT6xXlQqwmdDKVnApGTsnAGCWnXFSvFH6ilMcy_IiJ-cQ9BlDu9Mqv98fXDq6zOb7hL766E1HQUnu4ddut9PP2z2ruOCjIKjZWOccfb0PB888XuisYnnhjZoPPvMJs7RZUY7_KdVZIcVBVQbrLfdmdyucWbNumpTu9OKqwIKna-zw9qJdZ0qH_gY5q1w0A4HrXDYDoet8KAdHrTC7AheHf77HQAA__-eq8eX
            
        but found (query options: "") :
            distribution: local
            vectorized: true
            ·
            • filter
            │ filter: geom && '010100000000000000000008400000000000000840'
            │
            └── • index join
                │ table: geo_table@geo_table_pkey
                │
                └── • inverted filter
                    │ inverted column: geom_inverted_key
                    │ num spans: 31
                    │
                    └── • scan
                          missing stats
                          table: geo_table@geom_index
                          spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUk02P0zAQhu_8itFcuiuZrZ2UD_lUPrIQ1G1LWwkQrlahHkpEawfbRUFV_ztKAru0q7YkB0szk8fzvmPNFv2PFUpMPo4HL9IhXLxOp7Pp-8ElTJNB8moG3-F6MrqBJdnbkH1ZEXx4m0ySKl6D2nAePW1O6IxH6XB2EV9xiK_4ZUfKN8noJplNPiFDYzUNszV5lJ9R4Jxh4eyCvLeuSm3rH1JdouQMc1NsQpWeM1xYRyi3GPKwIpQ4qzRMKNPkuhwZagpZvqqvvZPYr8Td5kZTiQynRWa8hK7Cl0qVX7VSpeBKlfzcgY_bMkIhZEZDzMGGb-Q8zncM7Sbc-_EhWxJKsWP_7zk1P8kF0tf5KpAj1xX7xv_Wk7JwYA30hQRfuQYfMhdk7SJ-9kQpHnGlOD93IJDRbbHK_IF7hqNNkNAXR-cQtZnDO5ubP08fHXv6wuXrzP26b8360dHucZvud9OP93s3eQn96GAfuOCCP_ye9x7EHbm3LOeG1msje0K-sMbTnuhjN_PdnCHpJTVL6e3GLWjs7KJu04SjmqsTmnxoqqIJUtOUKoH_wuIkHJ2Go5NwfBqOT8K9A3i-e_Q7AAD__5gYlYA=
--- done: testdata/inverted_filter_geospatial_dist with config 5node: 16 tests, 1 failures
    logic.go:3370: 
        testdata/inverted_filter_geospatial_dist:262: error while processing
    logic.go:3370: testdata/inverted_filter_geospatial_dist:262: too many errors encountered, skipping the rest of the input
        --- FAIL: TestExecBuild/5node/inverted_filter_geospatial_dist (3.86s)
=== RUN   TestExecBuild/5node
    --- FAIL: TestExecBuild/5node (0.00s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-parallel=4

This test on roachdash | Improve this report!

@irfansharif
Copy link
Contributor

Dropping in to say that I think this flaked in CI more frequently after #73876, I think (going off on the test history alone).

@cucaroach
Copy link
Contributor

Well I've chased it down to the span resolver sometimes putting the table reader on the gateway node causing the plan to be local instead of full. Its very flakey though and it only seems to happen when the machine is overloaded, I haven't figured out more than that and don't know if that has anything to do with the span config changes.

@cucaroach
Copy link
Contributor

I think @irfansharif is right, I can't get a failure if I do this:

diff --git a/pkg/sql/logictest/logic.go b/pkg/sql/logictest/logic.go
index 8b556681c2..c93554b93c 100644
--- a/pkg/sql/logictest/logic.go
+++ b/pkg/sql/logictest/logic.go
@@ -1413,8 +1413,9 @@ func (t *logicTest) newCluster(serverArgs TestServerArgs, opts []clusterOpt) {
 
        params := base.TestClusterArgs{
                ServerArgs: base.TestServerArgs{
-                       SQLMemoryPoolSize: serverArgs.maxSQLMemoryLimit,
-                       TempStorageConfig: tempStorageConfig,
+                       DisableSpanConfigs: true,
+                       SQLMemoryPoolSize:  serverArgs.maxSQLMemoryLimit,
+                       TempStorageConfig:  tempStorageConfig,
                        Knobs: base.TestingKnobs{
                                Store: &kvserver.StoreTestingKnobs{
                                        // The consistency queue makes a lot of noisy logs during logic tests.

That said I will create a new bug for looking at this tests, they are obviously confused, ie:

# Distributed.
query T
EXPLAIN (DISTSQL)
SELECT k FROM geo_table WHERE ST_Intersects('MULTIPOINT((2.2 2.2), (3.0 3.0))'::geometry, geom) ORDER BY k
----
distribution: local

@cucaroach
Copy link
Contributor

@irfansharif let me know if there's any I can do to help with this, I repro it in ~1m.

@irfansharif
Copy link
Contributor

Repro steps? I probably won't get to it until next week. I should clarify that a of the fallout from enabling span configs has been around synchrony assumptions made in test code that never held but was more easily masked over previously. I'm not sure if that's what's happening here. If you have any working theories, I'll happily take them.

@cucaroach
Copy link
Contributor

❯ make stress PKG=github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder TESTTIMEOUT=40m0s GOFLAGS=-parallel=4 TAGS= 'STRESSFLAGS=-stderr -maxruns 100 -maxtime 1h0m0s -maxfails 1 -p 4'

If I bump up p from 4 to 12 and hits on first run usually. It doesn't repro unless you run all of opt tests, like just the 5node or just the inverted_filter_geospatial_dist tests doesn't fail. Its a pretty simple test and it confirms the index has ranges on the right nodes. If you have a suspicion on where the randomness might be coming from I'd be happy to dig deeper.

@cockroach-teamcity
Copy link
Member Author

sql/opt/exec/execbuilder.TestExecBuild failed with artifacts on master @ 506412ffdb6d0f6d4952d2599b81cb9294a123a1:

        /go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder/testdata/inverted_filter_geospatial_dist:238: EXPLAIN (DISTSQL)
        SELECT k FROM geo_table WHERE geom && 'POINT(3.0 3.0)'::geometry
        expected:
            distribution: full
            vectorized: true
            ·
            • filter
            │ filter: geom && '010100000000000000000008400000000000000840'
            │
            └── • index join
                │ table: geo_table@geo_table_pkey
                │
                └── • inverted filter
                    │ inverted column: geom_inverted_key
                    │ num spans: 31
                    │
                    └── • scan
                          missing stats
                          table: geo_table@geom_index
                          spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUk29v0z4Qx5__XoV1T7pJ3mon7Q_kR-VPBkFdW9pKgHA1hfooEa0dbAcFVX3vKAlsS6emxA9Oujt_7vw96_bgfmxBQPRxNn4RT8jF63ixXLwfX5JFNI5eLcl3cjOf3pINmjuffNki-fA2mkelvyMyZyz4v7akN5vGk-VFeM1IeM0ue0K8iaa30XL-CShoo3CS7NCB-AwcKASwopBZs0bnjC3D--pSrAoQjEKqs9yX4RWFtbEIYg8-9VsEARNzZbL-ACgo9Em6ra4dKJjcP0DOJxsEMTzQR4V5e-FlKXCOiULbZ43ycK9_VCq_S7XCAigsskQ7QfoSXkpZfFVSFpxJWbBzBq66MlwCSbQiISPGf0Pr4JRm3kVzrH-i9ahu0q1Hi7bPm8L_5qMis8RoMuKCuFI1cT6xXlQqwmdDKVnApGTsnAGCWnXFSvFH6ilMcy_IiJ-cQ9BlDu9Mqv98fXDq6zOb7hL766E1HQUnu4ddut9PP2z2ruOCjIKjZWOccfb0PB888XuisYnnhjZoPPvMJs7RZUY7_KdVZIcVBVQbrLfdmdyucWbNumpTu9OKqwIKna-zw9qJdZ0qH_gY5q1w0A4HrXDYDoet8KAdHrTC7AheHf77HQAA__-eq8eX
            
        but found (query options: "") :
            distribution: local
            vectorized: true
            ·
            • filter
            │ filter: geom && '010100000000000000000008400000000000000840'
            │
            └── • index join
                │ table: geo_table@geo_table_pkey
                │
                └── • inverted filter
                    │ inverted column: geom_inverted_key
                    │ num spans: 31
                    │
                    └── • scan
                          missing stats
                          table: geo_table@geom_index
                          spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUk02P0zAQhu_8itFcuiuZrZ2UD_lUPrIQ1G1LWwkQrlahHkpEawfbRUFV_ztKAru0q7YkB0szk8fzvmPNFv2PFUpMPo4HL9IhXLxOp7Pp-8ElTJNB8moG3-F6MrqBJdnbkH1ZEXx4m0ySKl6D2nAePW1O6IxH6XB2EV9xiK_4ZUfKN8noJplNPiFDYzUNszV5lJ9R4Jxh4eyCvLeuSm3rH1JdouQMc1NsQpWeM1xYRyi3GPKwIpQ4qzRMKNPkuhwZagpZvqqvvZPYr8Td5kZTiQynRWa8hK7Cl0qVX7VSpeBKlfzcgY_bMkIhZEZDzMGGb-Q8zncM7Sbc-_EhWxJKsWP_7zk1P8kF0tf5KpAj1xX7xv_Wk7JwYA30hQRfuQYfMhdk7SJ-9kQpHnGlOD93IJDRbbHK_IF7hqNNkNAXR-cQtZnDO5ubP08fHXv6wuXrzP26b8360dHucZvud9OP93s3eQn96GAfuOCCP_ye9x7EHbm3LOeG1msje0K-sMbTnuhjN_PdnCHpJTVL6e3GLWjs7KJu04SjmqsTmnxoqqIJUtOUKoH_wuIkHJ2Go5NwfBqOT8K9A3i-e_Q7AAD__5gYlYA=
--- done: /go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder/testdata/inverted_filter_geospatial_dist with config 5node: 16 tests, 1 failures
    logic.go:3370: 
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder/testdata/inverted_filter_geospatial_dist:262: error while processing
    logic.go:3370: /go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder/testdata/inverted_filter_geospatial_dist:262: too many errors encountered, skipping the rest of the input
        --- FAIL: TestExecBuild/5node/inverted_filter_geospatial_dist (3.88s)
=== RUN   TestExecBuild/5node
    --- FAIL: TestExecBuild/5node (0.00s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-parallel=4

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

sql/opt/exec/execbuilder.TestExecBuild failed with artifacts on master @ 03556f98c29c65e022d2c29cb03c99d0dc1b31b0:

    logic.go:2448: 
         
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder/testdata/inverted_filter_geospatial_dist:238: EXPLAIN (DISTSQL)
        SELECT k FROM geo_table WHERE geom && 'POINT(3.0 3.0)'::geometry
        expected:
            distribution: full
            vectorized: true
            ·
            • filter
            │ filter: geom && '010100000000000000000008400000000000000840'
            │
            └── • index join
                │ table: geo_table@geo_table_pkey
                │
                └── • inverted filter
                    │ inverted column: geom_inverted_key
                    │ num spans: 31
                    │
                    └── • scan
                          missing stats
                          table: geo_table@geom_index
                          spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUlFFv0zAQx9_5FNa9dJO81U5aQH4qjAyCura0lQDhagr1UaKldrAdFFT1u6MksK2dmq55sHRn_-78_8uXDbhfGQiIvkyGb-IROXsXz-azT8NzMouG0dWc3JHr6fiGrNDc-uR7huTzh2gaVfGayIKx4GWzks5kHI_mZ-ElI-ElO-8I8T4a30Tz6VegoI3CUbJGB-IbcKAQwIJCbs0SnTO2Sm_qQ7EqQTAKqc4LX6UXFJbGIogN-NRnCAJG5sLk3R5QUOiTNKuPbSmYwj9AzicrBNHf0keFeXvheSVwiolC22U75eFe_6BSfptqhSVQuDJZsdZOkDtaWwIUZnlSJboS3kpZ_lBSlpxJWbJjC1ycynAJJNGKhIwY_xOtg0M28FNsiPVvtB7VdZp5tGi7fNeL__tRmVtiNBlwQVylmjifWC9qFeGrvpQsYFIydmwBglqdilXi99RTGBdekAE_6ENwig8fTar_vYbg0GvIbbpO7J-H1nQQHOwentL93v1wt3eTF2QQ7M0f44yzp9_r3pO4I3aG85hpvZ1rHxnOKbrcaIfPmk62XVBAtcLmB-BMYZc4sWZZt2nCcc3VCYXON7v9Joh1s1Vd8DHMW-GgHQ5a4bAdDlvhXjvca4XZHrzYvvgbAAD__-uIzaE=
            
        but found (query options: "") :
            distribution: local
            vectorized: true
            ·
            • filter
            │ filter: geom && '010100000000000000000008400000000000000840'
            │
            └── • index join
                │ table: geo_table@geo_table_pkey
                │
                └── • inverted filter
                    │ inverted column: geom_inverted_key
                    │ num spans: 31
                    │
                    └── • scan
                          missing stats
                          table: geo_table@geom_index
                          spans: 31 spans
            ·
            Diagram: https://cockroachdb.github.io/distsqlplan/decode.html#eJyUk99v0zAQx9_5K6x72SaZ1U7KD_mpMDII6trSVgKEoynUR4mW2sF2UFDV_x0lgY12akryYOnu8vF97yvfFtyPHAREn2bjV_GEnL-JF8vFh_EFWUTj6GpJ7sj1fHpD1mhuffo1R_LxXTSP6nhDZMlY8Lw9ydlsGk-W5-ElI-EluzgT4m00vYmW889AQRuFk3SDDsQX4JBQKKxZoXPG1qlt80OsKhCMQqaL0tfphMLKWASxBZ_5HEHAstYwx1ShHTCgoNCnWd5cey9xVIu7zbTCCihcmbzcaCfIHW1UA4VFkdaJgYTXUlbflJQVZ1JW7NQBT_syXAJJtSIhI8Z_R-sg2VEwpX8Y0fl0jSD4jv6_DbH-idajus5yjxbtgO978bceVYUlRpMRF8TVUxPnU-tFM0X44pmULGBSMnbqAIJa9cXq4Q-mpzAtvSAjftSHoI8P702m_7yG4NhrKGy2Se2vh9Z0FBztHvbpfu9-uN-7zQsyCg5WhHHG2ePv5fBRfCb29ueUacM-sufoCqMd7ok-djPbJRRQrbHdU2dKu8KZNaumTRtOG65JKHS-rfI2iHVbqgX-C_NOOOiGg0447IbDTnh4ACe7J78DAAD__0Yzm4o=
--- done: /go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder/testdata/inverted_filter_geospatial_dist with config 5node: 16 tests, 1 failures
    logic.go:3379: 
        /go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder/testdata/inverted_filter_geospatial_dist:262: error while processing
    logic.go:3379: /go/src/github.com/cockroachdb/cockroach/pkg/sql/opt/exec/execbuilder/testdata/inverted_filter_geospatial_dist:262: too many errors encountered, skipping the rest of the input
        --- FAIL: TestExecBuild/5node/inverted_filter_geospatial_dist (2.85s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • GOFLAGS=-parallel=4

This test on roachdash | Improve this report!

@irfansharif
Copy link
Contributor

I've chased it down to the span resolver sometimes putting the table reader on the gateway node causing the plan to be local instead of full [...] Its a pretty simple test and it confirms the index has ranges on the right nodes.

Still poking blindly, but #75208 at a surface has similar symptoms (also possibly my fault).

@irfansharif
Copy link
Contributor

irfansharif commented Jan 21, 2022

span resolver sometimes putting the table reader on the gateway node

Might pointing me to the code where this happens? Or what goes into it? Maybe this guy?

func (dsp *DistSQLPlanner) planTableReaders(
planCtx *PlanningCtx, p *PhysicalPlan, info *tableReaderPlanningInfo,
) error {

Can confirm it easily repros on my GCE worker:

dev test pkg/sql/opt/exec/execbuilder -f 'TestExecBuild/5node' -v --show-logs --stress --stress-args '-p 4'

For posterity, here's the physical plan than the test expects, and here's the one it actually gets.

irfansharif added a commit to irfansharif/cockroach that referenced this issue Jan 22, 2022
Fixes cockroachdb#74933. This test asserts on physical plans of statements after
moving ranges around, i.e. whether distsql execution is "local" or
"distributed". This is determined by:
- where the distsql planner places processor cores,
- which in turn is a function of the span resolver,
- which delegates to the embedded distsender,
- which operates off of a range cache.

The range cache, like the name suggests, can hold stale data. This test
moved replicas of indexes around to induce distributed execution, but
was not accounting for the caches holding onto stale data. In my test
runs every so often I was seeing stale range descriptors from before the
relocate trip up the planner to generate a local execution, flaking the
test. Fortunately for us, all we need to do is slap on a retry. To
repro:

    # Took under a minute on my gceworker. Helped to trim down the test
    # file slightly.
    dev test pkg/sql/opt/exec/execbuilder \
      -f 'TestExecBuild/5node' -v --show-logs \
      --stress --stress-args '-p 4'

Release note: None
craig bot pushed a commit that referenced this issue Jan 25, 2022
75304: execbuilder: deflake TestExecBuild r=irfansharif a=irfansharif

Fixes #74933. This test asserts on physical plans of statements after
moving ranges around, i.e. whether distsql execution is "local" or
"distributed". This is determined by:
- where the distsql planner places processor cores,
- which in turn is a function of the span resolver,
- which delegates to the embedded distsender,
- which operates off of a range cache.

The range cache, like the name suggests, can hold stale data. This test
moved replicas of indexes around to induce distributed execution, but
was not accounting for the caches holding onto stale data. In my test
runs every so often I was seeing stale range descriptors from before the
relocate trip up the planner to generate a local execution, flaking the
test. Fortunately for us, all we need to do is slap on a retry. To
repro:

    # Took under a minute on my gceworker. Helped to trim down the test
    # file slightly.
    dev test pkg/sql/opt/exec/execbuilder \
      -f 'TestExecBuild/5node' -v --show-logs \
      --stress --stress-args '-p 4'

Release note: None

75436: kvserver: de-flake TestProtectedTimestamps r=irfansharif a=irfansharif

Fixed #75020. This test makes use of an "upsert-until-backpressure"
primitive, where it continually writes blobs to a scratch table
configured with a lower max range size (1<<18 from the default of
512<<20) until it experiences replica backpressure. Before #73876 (when
using the system config span), the splitting off of the scratch table's
ranges happened near instantly after the creation of the table itself.
This changed slightly with the span configs infrastructure, where
there's more of asynchrony built in and ranges may appear after a while
longer.

The test previously started upserting as soon as it created the table,
being able to implicitly rely on the tight synchrony of already having
the table's ranges carved out. This comes when deciding to record a
replica's largest previous max range size -- something we only do if the
replica's current size is larger than the new limit (see
(*Replica).SetSpanCnfig). When we weren't upserting until the range
applied the latest config, this was not possible. With span configs and
its additional asynchrony, this assumption no longer held. It was
possible then for the range to accept writes larger than the newest max
range size, which unintentionally (for this test at least) triggers an
escape hatch where we avoid backpressure when lowering a range's max
size below its current size (#46863).

The failure is easy to repro. This test runs in ~8s, and with the
following we were reliably seeing timeouts where the test was waiting
for backpressure to kick in (but it never did because of the escape
hatch).

    dev test pkg/kv/kvserver \
      -f TestProtectedTimestamps -v --show-logs \
      --timeout 300s --ignore-cache --count 10

De-flaking this test is simple -- we just wait for the table's replicas
to apply the latest config before issuing writes to it.

Release note: None

75443: ui: removed formatting to statements on the details pages r=THardy98 a=THardy98

**ui: removed formatting to statements on the details pages**

Previously, statements displayed on the statement/transaction/index
details pages were formatted (formatting was added to allow for better
readability of statements on these detail pages). However, statements
queries from the frontend were noticeably slower due to this
implementation. This change reverts the changes to statement formatting
(updates the fixtures to show the non-formatted statements), but keeps
the change that uses statement ID as an identifier in the URL instead of
the raw statement.

**Release note (ui change)**: removed formatting to statements on the
statement, transaction and index details pages, change to replace raw
statement with statement ID in the URL remained.

Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Thomas Hardy <[email protected]>
@craig craig bot closed this as completed in c538f2a Jan 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-sql-queries SQL Queries Team
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants