Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pg_stat_all_tables info not reliable (or direct execute problem). #92

Open
yazun opened this issue Jan 14, 2021 · 5 comments
Open

pg_stat_all_tables info not reliable (or direct execute problem). #92

yazun opened this issue Jan 14, 2021 · 5 comments

Comments

@yazun
Copy link
Contributor

yazun commented Jan 14, 2021

We rely on updating stats on the coordinators on a global view by comparing pg_stat_all_tables differences between datanode and coordinator.
Noticing however very worrisome, erratic behaviour of the pg_stat_all_tables values.
Below you can see that values returned are either ok or nulls - they query is executed in second intervals and seem randomy returning proper values or nothing. This basically breaks coordinators update (analyze(coordinator).

Do you have any idea why that could be happening? - Seems like a caching problem of some stale pool connection maybe?
How to fix it?

Thanks

Time: 57.723 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > execute direct on (datanode4)$$select * from pg_stat_all_tables where relname ~ 'tbl_partition_1' $$;
  relid  |    schemaname     |     relname     | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |          last_vacuum          | last_autovacuum |         last_analyze         | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
---------+-------------------+--------------------------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-----------------+------------------------------+------------------+--------------+------------------+---------------+-------------------
 6369512 | dr3_ops_cs36_part | tbl_partition_1 |        0 |            0 |  5813523 |       2928286 |         0 |         0 |         0 |             0 |   11498085 |          0 |                   0 | 2021-01-14 09:47:33.141461+01 | [null]          | 2021-01-14 09:49:35.76404+01 | [null]           |            3 |                0 |             3 |                 0
(1 row)

Time: 46.543 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > execute direct on (datanode4)$$select * from pg_stat_all_tables where relname ~ 'tbl_partition_1' $$;
  relid  |    schemaname     |     relname     | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |          last_vacuum          | last_autovacuum |         last_analyze         | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
---------+-------------------+--------------------------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-----------------+------------------------------+------------------+--------------+------------------+---------------+-------------------
 6369512 | dr3_ops_cs36_part | tbl_partition_1 |        0 |            0 |  5813523 |       2928286 |         0 |         0 |         0 |             0 |   11498085 |          0 |                   0 | 2021-01-14 09:47:33.141461+01 | [null]          | 2021-01-14 09:49:35.76404+01 | [null]           |            3 |                0 |             3 |                 0
(1 row)

Time: 58.182 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > execute direct on (datanode4)$$select * from pg_stat_all_tables where relname ~ 'tbl_partition_1' $$;
  relid  |    schemaname     |     relname     | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
---------+-------------------+--------------------------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
 6369512 | dr3_ops_cs36_part | tbl_partition_1 |        0 |            0 |        0 |             0 |         0 |         0 |         0 |             0 |          0 |          0 |                   0 | [null]      | [null]          | [null]       | [null]           |            0 |                0 |             0 |                 0
(1 row)

Time: 50.242 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > execute direct on (datanode4)$$select * from pg_stat_all_tables where relname ~ 'tbl_partition_1' $$;
  relid  |    schemaname     |     relname     | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
---------+-------------------+--------------------------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-----------------+--------------+------------------+--------------+------------------+---------------+-------------------
 6369512 | dr3_ops_cs36_part | tbl_partition_1 |        0 |            0 |        0 |             0 |         0 |         0 |         0 |             0 |          0 |          0 |                   0 | [null]      | [null]          | [null]       | [null]           |            0 |                0 |             0 |                 0
(1 row)

Time: 48.042 ms
(dr3_ops_cs36@gaiadb12i:55431) [surveys] > execute direct on (datanode4)$$select * from pg_stat_all_tables where relname ~ 'tbl_partition_1' $$;
  relid  |    schemaname     |     relname     | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |          last_vacuum          | last_autovacuum |         last_analyze         | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
---------+-------------------+--------------------------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------------------------+-----------------+------------------------------+------------------+--------------+------------------+---------------+-------------------
 6369512 | dr3_ops_cs36_part | tbl_partition_1 |        0 |            0 |  5813523 |       2928286 |         0 |         0 |         0 |             0 |   11498085 |          0 |                   0 | 2021-01-14 09:47:33.141461+01 | [null]          | 2021-01-14 09:49:35.76404+01 | [null]           |            3 |                0 |             3 |                 0
@yazun
Copy link
Contributor Author

yazun commented Jan 14, 2021

It happened only for a single dn so far, but from any coordinator.

@yazun
Copy link
Contributor Author

yazun commented Jan 14, 2021

And bouncing DB does not help.

@bethding-database
Copy link
Contributor

bethding-database commented Mar 23, 2021

@yazun I try to reproduce this promble, but it hasn't happend yet. I want to confirm some questions:
1、 Whether tbl_partition_1 is partition table or not?
2、Is there "corrupted statistics file" or "out of file descriptors" in datanode4's log?
3、Is it shows that stats collector process exitting when the promble occurs in datanode4's log?
And if you have a simple way to reproduce the promble, please let me know.
Thanks.

@beth-database
Copy link

@yazun
Have you solved the problem?Or more clues for it?
Do you have a way to reproduce this problem?

@kali-brandwatch
Copy link

Rescuing a very old thread just because I landed here in search for answers, I had a very similar issue and I got the erratic results resolved by means of issuing an ANALYZE VERBOSE my_table;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants