-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support filtering top-level resource based on embedded resource filter #1949
Conversation
Hints used like `select=projects.client_id(*)` were already deprecated. '!' should be used from now on `select=projects!client_id(*)`
Sorry for not bringing this up earlier in the issue, but looking at the example ... GET /projects?select=id,clients!inner(id)&clients.id=eq.2 ... it suddenly strikes me as odd to have something in the Since we already have embedded filters as Maybe something like this could make it an inner join: GET /projects?select=id,clients(id)&clients.id=eq.2&clients=inner |
@wolfgangwalther True. I was mostly going from what we discussed in #915 (comment). Mainly that the embedding was a function and that the syntax for parameters would be
I like the direction. Maybe we could have have something like: GET /projects?select=id,clients(id)&clients.id=eq.2&clients=join.inner In fact, the hints could also be expressed on the query param: GET /projects?select=id,clients(id)&clients.id=eq.2&clients=join(inner).hint1,hint2 (The That keeps the GET ?select=avg&avg=args.amount |
I guess one drawback of that syntax is that it's more verbose(both for the REST interface and libraries probably). |
I see. The "embedding as a function" pov only holds true for the
Hm, I think it's enough to specify the
Assuming we'd keep the ability to put hints (and therefore the embedding specification) in only one place (either query param or select), this would be a breaking change. Once we're at that stage, where we implement such a big breaking change to the query syntax, I'd have a few other points to raise to change in the query syntax. Two biggest things that come to mind are correct quoting / escaping and moving the operator to the left of the equal sign like e.g. So, I suggest to just go with the non-breaking change for this PR - and then open another issue where we can discuss a "new query syntax" that could get a few things straight. We could implement both at the same time and have them configurable via config option, to be able to slowly deprecate the old syntax. |
It would be too inconsistent at the parsers level, I'm not even sure if it can be done cleanly. All of our filters have a
Hm, if this is meant for the
I was thinking to have both actually, so no breaking change. I'm still not set on either though. For now I'll continue with the |
No, I propose to have all filters on the left side.
Imho, that would make it quite complicated to parse. You could have conflicting targets/hints/... in both parts - this will be very hard to efficiently create a good query. |
The I propose to implement resource embedding with correlated join, which is essentially the same as the current approach of correlated subquery, but with the extensibility to support left/inner/anti-joins. Specifically, instead of translating the left-join resource embedding of with pg_source as (
select projects.id, coalesce(
(select json_agg(clients.*)
from (select clients.id
from clients
where clients.project_id = projects.id and clients.id = 2
) clients),
'[]') as clients
from projects
)
select coalesce(json_agg(_postgrest_t), '[]')::character varying as body
from (select * from pg_source) _postgrest_t; a correlated join version could be with pg_source as (
select projects.id, coalesce(nullif(clients_clients.clients::text, '[null]'), '[]')::json AS clients
from projects
left join lateral (select json_agg(clients) clients
from (select clients.id
from clients
where clients.project_id = projects.id and clients.id = 2
) clients
) clients_clients
on clients_clients.clients is not null
)
select coalesce(json_agg(_postgrest_t), '[]')::character varying as body
from (select * from pg_source) _postgrest_t; An inner-join resource embedding is then with pg_source as (
select projects.id, coalesce(nullif(clients_clients.clients::text, '[null]'), '[]')::json AS clients
from projects
join lateral (select json_agg(clients) clients
from (select clients.id
from clients
where clients.project_id = projects.id and clients.id = 2
) clients
) clients_clients
on clients_clients.clients is not null
)
select coalesce(json_agg(_postgrest_t), '[]')::character varying as body
from (select * from pg_source) _postgrest_t; And an anti-join resource embedding is with pg_source as (
select projects.id, coalesce(nullif(clients_clients.clients::text, '[null]'), '[]')::json AS clients
from projects
join lateral (select json_agg(clients) clients
from (select clients.id
from clients
where clients.project_id = projects.id and clients.id = 2
) clients
) clients_clients
on clients_clients.clients is null
)
select coalesce(json_agg(_postgrest_t), '[]')::character varying as body
from (select * from pg_source) _postgrest_t; The correlated-join approach doesn't rely on pk inference, works on view and Does that make any sense? |
BTW, the ability to do aggregation and flatten join seems to be a popular requested feature (#1233, #211, #1126, #915 and others). Do we have any plan to support explicit join/group by? Say, something like
Look like it will take a huge effort. |
@Iced-Sun Awesome! Your approach is much better since it doesn't rely on GROUP BY and it also works on views 🎉 🎉 So to understand it better, taking my approach on #1075 (comment), the query for inner joins would now be WITH pg_source AS (
SELECT "test"."clients"."id",
"test"."clients"."name",
"projects_projects"."projects" AS "projects"
FROM "test"."clients"
INNER JOIN LATERAL (
SELECT json_agg("projects") "projects" -- important change here
FROM(
SELECT "test"."projects"."id", "test"."projects"."name"
FROM "test"."projects"
WHERE "test"."projects"."client_id" = "test"."clients"."id") "projects"
) AS "projects_projects" ON "projects_projects"."projects" IS NOT NULL -- important change here
)
SELECT coalesce(json_agg(_postgrest_t), '[]')::character varying AS BODY
FROM (SELECT * FROM pg_source) _postgrest_t; Where the main changes are the correlated join(which avoids the need for GROUP BY) plus the As you pointed out, this practically has the same cost that our correlated subquery has( (The above is just my simplistic view, I think you've determined that GROUP BY is actually worse in perf) So I'll proceed with the correlated join approach for this feature.
Yes, but besides the syntax, we still need a way to protect against a potentially expensive group by, this was discussed on #915. |
I did some simple performance tests in #1075 (comment). It suggests that the I have no solid facts on grouping by full-size keys (in the case that group-by-pk is not applicable), but I do have some not-so-good experiences when I have to group by multiple text columns to work around bad schema design. As you said, correlated join doesn't break things or bring surprises, and it unifies the left/inner/anti-join resource embedding. Thanks for the great piece of software. |
@wolfgangwalther Been thinking about this, why is it "odd" to change the shape of the output on GET /projects?select=count
[{"count":5}] And of course in SQL the same thing happens - not only WHERE modifies the rows, SELECT also does. |
That's odd for me, already. For me, using an aggregate function without a |
@wolfgangwalther I see. I've also always find it odd that these queries work select p.json_agg from test.projects p;
select p.count from projects p;
select p.array_agg from projects p;
-- same through postgrest (I think we discussed this somewhere before) Edit: Also, using a different
Yeah. So considering this, WDYT about just going with the |
1adedef
to
b1a42c0
Compare
Alright, this is now ready for a final review. I've added tests for views(with m2o/o2m/m2m cases), rpc and embedding with hints. I've also added a |
test/io-tests/configs/expected/no-defaults-with-db-other-authenticator.config
Outdated
Show resolved
Hide resolved
This is enabled by adding `!inner` to the embedded resource /projects?select=*,clients!inner(*)&clients.id=eq.12 This behaviour can be enabled by default with the config option db-embed-default-join='inner' Which saves the need for specifying `!inner` on every request. If this is enabled, the previous behavior can be restored per request by specifying `!left` on the embedded resource. /projects?select=*,clients!left(*)&clients.id=eq.12` Tested on M20/02M/M2M relationships, views, RPC.
9d9892c
to
41ef1cb
Compare
How does this work with joins that are already using i.e. if I have this:
How should it switch to an |
supplier_id int references suppliers(id), | ||
trade_union_id int references trade_unions(id), | ||
primary key (supplier_id, trade_union_id) | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a newline, FYI. Some editors struggle with opening such files, or so I have heard
@colearendt Yes, this should work like postgrest/test/Feature/EmbedInnerJoinSpec.hs Lines 215 to 218 in 627c3c3
|
Thanks! That's perfect! I totally missed that example 🙈 Found a little bug in #1977, but this is really awesome!! Well done! |
Closes #1075. Allow filtering the top-level resource based on the embedded filter **
with thelike so:LATERAL/GROUP BY
approach mentioned on #1075 (comment),This PR will only cover tables, views will need more work and smartness from our schema cache; mainly because when doingGROUP BY
, pg is able to infer the pk on a table but not a view, more details here.Edit: Thanks to Iced-Sun's query below(#1949 (comment)), this will now work for both tables and views.
This behavior can also be enabled by default with the following config optiondb-embed-default-join="inner"Which saves the need for specifying!inner
on every request. In this case, you can go back to the previous behavior per request by specifying!left
on the embedded resource, e.g/projects?select=*,clients!left(*)&clients.id=eq.12
Edit: The
db-embed-default-join
config was removed in #2034Steps
This PR won't cause a breaking change. Embedding will still use subqueries by default, only on
!inner
it will use LATERAL.