Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compress_chunk() blocks other queries on the table for a long time #2732

Open
phemmer opened this issue Dec 10, 2020 · 8 comments
Open

compress_chunk() blocks other queries on the table for a long time #2732

phemmer opened this issue Dec 10, 2020 · 8 comments

Comments

@phemmer
Copy link

phemmer commented Dec 10, 2020

Relevant system information:

  • OS: Debian 9
  • PostgreSQL version (output of postgres --version): postgres (PostgreSQL) 12.4 (Debian 12.4-1.pgdg90+1)
  • TimescaleDB version (output of \dx in psql): timescaledb | 2.0.0-rc4
  • Installation method: apt from packagecloud.io

Describe the bug
When compressing a chunk with compress_chunk(), after a certain point in the process, any queries with chunks_detailed_size() will block (possibly other metadata information queries as well, not sure).

To Reproduce
Steps to reproduce the behavior:

  1. Have a multi-node cluster (not sure if necessary, but that's what I'm using)
  2. On access node: select compress_chunk('mychunk');
  3. On access node: While running, periodically do select * from chunks_detailed_size('mytable')

Expected behavior
Successful response in a short amount of time.

Actual behavior
On the data node:

Dec 10 03:48:49 ded4077 postgres[29867]: [38-1] pid=29867 db=edgestats user=postgres rhost=1.1.1.70 app=timescaledb tid=4/3050 sid=5fcfdb10.74ab LOG:  process 29867 still waiting for AccessShareLock on relation 3424460 of database 16386 after 1000.110 ms
Dec 10 03:48:49 ded4077 postgres[29867]: [38-2] pid=29867 db=edgestats user=postgres rhost=1.1.1.70 app=timescaledb tid=4/3050 sid=5fcfdb10.74ab DETAIL:  Process holding the lock: 14771. Wait queue: 29867.
Dec 10 03:48:49 ded4077 postgres[29867]: [38-3] pid=29867 db=edgestats user=postgres rhost=1.1.1.70 app=timescaledb tid=4/3050 sid=5fcfdb10.74ab CONTEXT:  PL/pgSQL function _timescaledb_internal.chunks_local_size(name,name) line 3 at RETURN QUERY
Dec 10 03:48:49 ded4077 postgres[29867]: [38-4] pid=29867 db=edgestats user=postgres rhost=1.1.1.70 app=timescaledb tid=4/3050 sid=5fcfdb10.74ab STATEMENT:  SELECT * from _timescaledb_internal.chunks_local_size( 'public', 'mytable'  );

...eventually timing out due to statement_timeout (in my case, 5 minutes).

Additional context
While I doubt it would be possible to not block at all, I think the blocking time should be reduced to a few seconds at most.

@phemmer
Copy link
Author

phemmer commented Dec 10, 2020

Ok, so this issue is actually much worse than originally described, the blocking also affects normal read queries on the table.

Looks like #2669 might be related to this. Though that one is for decompress, not compress.

@phemmer phemmer changed the title compress_chunk() blocks chunks_detailed_size() for a long time compress_chunk() blocks other queries on the table for a long time Dec 10, 2020
@ebreijo
Copy link

ebreijo commented Mar 18, 2022

I'm also having this issue when using the backfill script on large data set. The compress/decompress is locking read queries on the hypertable that contains the compressed chunks.

Currently running timescale 2.4.2 on Postgres 13.

Is there a way to prevent this lock on read queries somehow?

@akaashp
Copy link

akaashp commented Apr 5, 2024

having this issue when my policy defined compression job runs. Blocks all writes to the table until the job is done even though an old chunk that is not being written to is being compressed. Makes compression unusable for high throughput...

@sseveran
Copy link

I am experiencing the same thing. How is compression deployed in the real world? Is concurrently compressing a chunk and being able to read from the table a paid feature?

@phemmer
Copy link
Author

phemmer commented Jun 27, 2024

At this point, I might recommend Timescale change their official recommendations, and the default (7 days), on the size of chunks. A few reasons for this.

  1. This issue. Bigger chunks mean longer times where all operations are frozen waiting for the chunk to compress. On high volume inserts, if the ingestion is blocked long enough, it can cause the data submitters to start timing out and dropping data.
  2. The reasoning against smaller chunks is that it results in more planning time, but does does not actually seem significant. I keep 45 days of data, and typically use 4h chunks, and planning time is insignificant.
  3. With the option compress_chunk_time_interval, multiple smaller chunks can be rolled into larger ones.
  4. It's easy to migrate from small chunks to large ones. But impossible to migrate from large chunks to smaller ones.
  5. With issues like [Bug]: query planner not using index with subquery & multiple chunks #7068, large uncompressed chunks cause a major performance hit to queries.

So it seems like the recommended/default chunk size was established back when Timescale was new, and has not kept up with the state of things.

And honestly for me, due to all of the above, I'm considering shrinking our chunks down to 15 minutes.

@iliastsa
Copy link

iliastsa commented Aug 13, 2024

We are experiencing the same thing, we have some ETL pipelines that insert into hypertables that also need to update/delete some data. If a compression job is running, then the update/delete query is blocked until compression is complete. It should be noted that the rows being updated/deleted do not reside in the chunk being compressed but are in the "tip" of the table that is uncompressed.

@iliastsa
Copy link

Are there any updates on this front? We'd like to backfill some data at our company, but the locking troubles described above are a huge blocker as it prevents inserts at the head of the table

@zaita-llc
Copy link

I am also experiencing significant disruption due to this locking behavior. I can actually tolerate the delay on new writes because I can ensure I'm not compressing the latest chunk where all the writes are going. However, the read lock is causing deadlocks and connection exhaustion in Ignition SCADA, lately even forcing me to cycle the DB connection as the deadlock is never released and the DB connection goes into permanent fault. That being said, compression is almost essential for me to get good query performance on my huge dataset, so I can't easily do without it. I wouldn't consider this just a performance enhancement, but a rather essential requirement for realtime systems. Appreciate your consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants