-
Notifications
You must be signed in to change notification settings - Fork 882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: query planner not using index with subquery & multiple chunks #7068
Comments
@phemmer the main issue is that it's not able to push the materialized subquery down to the filter level for the scans on the chunks to be effective. This might be a core PostgreSQL planner limitation. |
Ok, I've managed to create a reproducer. It has to do with compression. If the first of the 2 chunks is compressed, then a seq scan is used on the second chunk. But if the first chunk is decompressed, an index scan is used on both. |
@phemmer care to share the reproducer script here? |
I already did. it's in the description |
Yeah, it seems to be unable to generate a parameterized path on compressed chunks, this is the closest I can get, with some modifications to nudge the hypertable to the inner side of the nested loop:
|
What type of bug is this?
Performance issue
What subsystems and features are affected?
Query planner
What happened?
When I execute a query that uses a subquery filter & multiple chunks, the wrong (or no) index is used, causing a large performance degradation. If I don't use a subquery, or if I only query a single chunk at a time, it works fine.
Here's an example showing the issue:
https://explain.dalibo.com/plan/a5bh7372bgcg0ee8#raw
2_chunks_subquery.txt
We can see that on
_timescaledb_internal._hyper_2427_264588_chunk
, it's doing aseq scan
without using an index, takes 10 seconds, and returns 27,604,988 rows, causing a ton of work for the higher operations.I have an index, which is on both the
tag_id
andtime
columns, which would result in a much faster query. This is why I'm using a materialized CTE here, as I was trying to strongly encourage postgres to use the index containing thetag_id
column. No matter if I use a normal subquery, a join, etc, none result in using the correct index.If I manually take that subquery (the CTE), evaluate it, and copy/paste the results into the
where
clause, it goes much faster:https://explain.dalibo.com/plan/54h8h7b5ee5b8gd6#raw
2_chunks_copypaste.txt
We can see now that the correct index was used (
_hyper_2427_264588_chunk_haproxy_server_tag_id_time_idx
), which returned only 2,400 rows, and completed in 3.2ms.Both the above queries spanned 2 chunks. If I reduce to just the second (chronologically) of the two chunks (the one that resulted in the performance difference in the above 2 queries), though still using the subquery, the plan again uses the correct index:
https://explain.dalibo.com/plan/556dh3acg5f3173g#raw
1_chunk_subquery.txt
And just for comparison, when using copy/paste instead of subquery, it has similar plan & performance:
https://explain.dalibo.com/plan/825h52f73d389f9h#raw
1_chunk_copypaste.txt
So basically:
TimescaleDB version affected
2.14.2
PostgreSQL version used
16.2
What operating system did you use?
Debian 16
What installation method did you use?
Deb/Apt
What platform did you run on?
On prem/Self-hosted
Relevant log output and stack trace
No response
How can we reproduce the bug?
The above will perform a seq scan on the second chunk. But you can then decompress the chunk and watch it perform an index scan on both chunks.
The text was updated successfully, but these errors were encountered: