Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(queries): pg_stat_user_tables: skip tables with an AccessExclusiveLock #19

Merged
merged 2 commits into from
Dec 14, 2023

Conversation

chtitux
Copy link
Contributor

@chtitux chtitux commented Dec 14, 2023

Methods computing size of tables or related objects (such as pg_total_relation_size or pg_table_size) acquires an AccessShareLock. This happens while refreshing a Materialized View or when ALTER'ing a table.

A fix has been implemented to estimate (very precisely) the size of materialized views, so the query does not wait while a materialized view is being refreshed. However, this did not fix the issue when a table is being ALTER'ed: the SQL query waits for the AccessExclusiveLock being released, and this can take several minutes. It is an issue because the exporter stops exporting any data while the ALTER is running.

To prevent that, we check in pg_locks table if an AccessExclusiveLock exists: if it does exist, we skip it:

  • the exporter will compute size for tables with no AccessExclusiveLock locks
  • the exporter will not be blocked

This is not perfect: there may have a race condition where the Lock is being acquired just before the query is being executed. However, we believe this should prevent the common cases where an ALTER is run for a long time (several minutes), and we consider having the exporter blocked for a single execution acceptable.

@chtitux
Copy link
Contributor Author

chtitux commented Dec 14, 2023

#sre

@chtitux
Copy link
Contributor Author

chtitux commented Dec 14, 2023

#20 will fix the prometheus-rules test.

…iveLock

Methods computing size of tables or related objects (such as `pg_total_relation_size` or `pg_table_size`) acquires an
AccessShareLock. This happens while refreshing a Materialized View or when ALTER'ing a table.

A fix has been implemented to estimate (very precisely) the size of materialized views, so the query does not wait
while a materialized view is being refreshed. However, this did not fix the issue when a table is being ALTER'ed:
the SQL query waits for the AccessExclusiveLock being released, and this can take several minutes. It is an issue
because the exporter stops exporting any data while the ALTER is running.

To prevent that, we check in `pg_locks` table if an AccessExclusiveLock exists: if it does exist, we skip it:
- the exporter will compute size for tables with no AccessExclusiveLock locks
- the exporter will not be blocked

This is not perfect: there may have a race condition where the Lock is being acquired just before the query is being executed.
However, we believe this should prevent the common cases where an ALTER is run for a long time (several minutes),
and we consider having the exporter blocked for a single execution acceptable.
…ize of the materialized views without pg_XXX_size

Now we skip tables that have an AccessExclusiveLock that prevents pg_XXX_size to be executed,
we can rollback the complex optimization and use the simple form with pg_XXX_size for materialized views.

Because we use JOIN, if a AccessExclusiveLock is on a table/MatView, the table will be missing from the query,
and the line will be missed from the export page.

We could use LEFT JOIN:
- the line for the table will be kept, with NULL values for size fields.
For now, we prefer having missing data rather than NaN number reported.
@chtitux chtitux force-pushed the fix-access-exclusive-lock-size branch from cf3e1f6 to b7c175e Compare December 14, 2023 14:04
@chtitux chtitux merged commit 03e48a6 into main Dec 14, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants