-
Notifications
You must be signed in to change notification settings - Fork 164
current_setting
can lead to bad performance when used on RLS
#609
Comments
Hello Steve,Whan about index on uuid column?
|
This is certainly not because of LEAKPROOF or RLS. You can see it easily, by manually executing the constructed query: SELECT COUNT(*) FROM chat_messages WHERE workspace = (app_metadata() ->> 'workspace')::uuid; This is equally slow - even without RLS.
But in this case, your function is inside the RLS policy. If you're leaking data there, that's all your own responsibility.
Yes, this is the right solution. |
We encourage using
Thanks dwagin, yeah that works. But the thing is, we didn't have this issue with the legacy GUCs that were available before v9.0. So in the above example, by doing this: create or replace function workspace()
returns uuid as $$
select (current_setting('request.jwt.claims.app_metadata.workspace', true))::uuid;
$$ language sql stable;
drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
workspace = workspace()
);
begin;
set local role postgrest_test_anonymous;
select set_config('request.jwt.claims.app_metadata.workspace', '2156a8d9-d06f-40cd-adf8-411ae8c0bc76', true);
explain analyze SELECT COUNT(*) FROM chat_messages;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Aggregate (cost=87937.00..87937.01 rows=1 width=8) (actual time=1252.412..1252.413 rows=1 loops=1)
-> Seq Scan on chat_messages (cost=0.00..87937.00 rows=1 width=0) (actual time=1252.409..1252.409 rows=0 loops=1)
Filter: (workspace = (current_setting('request.jwt.claims.app_metadata.workspace'::text, true))::uuid)
Rows Removed by Filter: 3000000
Planning Time: 0.292 ms
Execution Time: 1252.451 ms The query is now x14 faster, without adding an index. So the JSON GUC parsing is indeed a problem. |
My proposal would be to have an allowlist config that brings back the legacy GUCs for certain keys to avoid the JSON parsing problem. Could be something like http-context-shortcuts = "request.jwt.claims.app_metadata.workspace, request.headers.user_agent, request.cookies.session_id" This seems related to PostgREST/postgrest#1941, maybe the config proposed there could be merged with this one. |
I still think it's only a problem when you parse it 3 million times like in your example on a seq scan. But that's not the GUCs fault.
You can easily do this in a pre-request function. Just set your custom GUC to the extracted value from the JSON. I'd rather wait for schema variables to finally come to PG - because those will solve this nicely. |
Yeah, that would work for avoiding the JSON guc but it will also come with a perf decrease because we don't have pipeline mode. Last time I checked adding a
True. Have you reviewed that feature by any chance? I wonder if it will land on pg16 and if indeed we can use it as we intend to. |
That's a downside I guess, it will be almost a year for pg16.
As I mentioned above the example is contrived. But there are some real world queries with big tables where filtering on some millions of rows is legitimate. I do think the JSON GUC is a problem here since the perf problem doesn't happen on the legacy gucs. |
In my environment only x3 (PG 13). |
Hm, I've tried on pg13(with |
And speed up when |
Circling back on this. How about this config: # this is the default
http-context = "request.method, request.path, request.jwt.claims, request.headers, request.cookies"
# we also set `search_path` and `role` for every request, but those cannot be modified as they're core to our security model GUCs can be removed and shortcuts can be added: http-context = "request.jwt.claims.app_metadata.workspace, request.headers" For every sub level on This will solve this issue and PostgREST/postgrest#1941. I think it will be useful even if we can do it with |
Just found out a plain SQL solution to the above issue, we can force computing the drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
(
with cached as materialized(
select (app_metadata() ->> 'workspace')::uuid as val
)
select workspace = val from cached
)
); Now the policy is still correct and the COUNT is fast: begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"[email protected]","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);
SELECT COUNT(*) FROM chat_messages;
count
---------
1500000
explain analyze SELECT COUNT(*) FROM chat_messages;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=241687.00..241687.01 rows=1 width=8) (actual time=775.210..775.211 rows=1 loops=1)
-> Seq Scan on chat_messages (cost=0.00..237937.00 rows=1500000 width=0) (actual time=0.145..728.419 rows=1500000 loops=1)
Filter: (SubPlan 3)
Rows Removed by Filter: 1500000
SubPlan 3
-> Result (cost=0.05..0.06 rows=1 width=1) (actual time=0.000..0.000 rows=1 loops=3000000)
CTE cached
-> Result (cost=0.00..0.03 rows=1 width=16) (actual time=0.054..0.056 rows=1 loops=1)
InitPlan 2 (returns $1)
-> CTE Scan on cached (cost=0.00..0.02 rows=1 width=16) (actual time=0.058..0.060 rows=1 loops=1)
Planning Time: 0.238 ms
Execution Time: 775.264 ms This should be added to the docs |
This is epic! Also works wonders for things like postgres realtime too! |
Note the above policy with the materialized CTE is not always a win. If the column is indexed, the above policy results in a noticeable slowdown(almost x6 slower) decrease: create index on chat_messages(uuid);
drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
(
with cached as materialized(
select (app_metadata() ->> 'workspace')::uuid as val
)
select workspace = val from cached
)
);
begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"[email protected]","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);
explain analyze SELECT COUNT(*) FROM chat_messages;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=209086.43..209086.44 rows=1 width=8) (actual time=817.665..817.666 rows=1 loops=1)
-> Index Only Scan using chat_messages_workspace_idx on chat_messages (cost=0.43..205336.43 rows=1500000 width=0) (actual time=0.098..776.944 rows=1500000 loops=1)
Filter: (SubPlan 2)
Rows Removed by Filter: 1500000
Heap Fetches: 0
SubPlan 2
-> CTE Scan on cached (cost=0.03..0.05 rows=1 width=1) (actual time=0.000..0.000 rows=1 loops=3000000)
CTE cached
-> Result (cost=0.00..0.03 rows=1 width=16) (actual time=0.055..0.057 rows=1 loops=1)
Planning Time: 0.240 ms
Execution Time: 817.720 ms While the original policy without the materialized CTE from above: drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
workspace = (app_metadata() ->> 'workspace')::uuid
);
begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-4
0cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"[email protected]","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca
4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);
explain analyze SELECT COUNT(*) FROM chat_messages;
BEGIN
SET
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=35164.45..35164.46 rows=1 width=8) (actual time=124.649..124.650 rows=1 loops=1)
-> Index Only Scan using chat_messages_workspace_idx on chat_messages (cost=0.45..31414.95 rows=1499800 width=0) (actual time=0.082..78.337 rows=1500000 loops=1)
Index Cond: (workspace = ((((current_setting('request.jwt.claims'::text, true))::jsonb -> 'app_metadata'::text) ->> 'workspace'::text))::uuid)
Heap Fetches: 0
Planning Time: 0.353 ms
Execution Time: 124.694 ms
(6 rows) So maybe we can just boil down to "always use an index when comparing against |
current_setting
can lead to bad performance when used on RLS
There is a better way to force pg to execute the Instead of this: create policy sel on chat_messages for select
using(
workspace = (app_metadata() ->> 'workspace')::uuid
); You'll need to wrap the right side in a scalar subquery: create policy sel on chat_messages for select
using(
workspace = (select (app_metadata() ->> 'workspace')::uuid)
); Since the subquery does not depend on the outer query, it will result in an InitPlan. This will run it exactly once, and then put the result into the main query which can use an index or not, it doesn't matter - it will be fast in both cases. First policy explained:
Second policy explained:
Second policy + index:
|
interesting, so should be the same performance as materialized CTE but a bit less verbose to write? |
No, it has better performance than a materialized CTE. |
I did a quick test in https://github.com/orgs/supabase/discussions/9311#discussioncomment-5732511 and get the same result with either method on a quick test, I'll do more. It makes sense they would be the same as they set a constant value to compare rows against and get run once. BUT I believe this method is the one scattered thru out my original table https://github.com/orgs/supabase/discussions/9311#discussioncomment-3815869 which did things like: |
I did just run a 2nd test of the original auth.uid() = user_id 25K rows, and get So the initplan way is faster in this case. |
The difference you see between initplan and without for an indexed query is negligible. In theory, it's a very tiny overhead of running an additional initplan node in the query. But doing a few more runs gives me the following results: Bare query using index, but no initplan:
Scalar subquery using index and initplan:
There's a lot more noise here than any noticeable difference between the two. |
Added some more tests a realistic two part RLS policy here with code: https://github.com/orgs/supabase/discussions/9311#discussioncomment-5734798
Edit... Adding |
Edit: solution below.
Having:
Doing a count is fast without RLS:
When RLS is enabled, the count is awfully slow:
This happens because:
::jsonb
) is not LEAKPROOF , this means that the casting will be done for each row on the query, PostgreSQL doesn't cache this static value.json
value inside therequest.jwt.claims
is somewhat lengthy. We have no control over this as the JWT can come from an external system. A shorterjson
claims will lead to less time:The
count
is a contrived example(for which there are workarounds) but more real queries will lead to the same problem.References:
The text was updated successfully, but these errors were encountered: