Support filtering top-level resource based on embedded resource filter #1949

steve-chavez · 2021-09-15T18:18:48Z

Closes #1075. Allow filtering the top-level resource based on the embedded filter ** ~~with the LATERAL/GROUP BY approach mentioned on #1075 (comment),~~ like so:

GET /projects?select=id,clients!inner(id)&clients.id=eq.2

[{"id":3,"client":{"id":2}},
 {"id":4,"client":{"id":2}}]

# the usual left join would give
GET /projects?select=id,clients(id)&clients.id=eq.2
[{"id":1,"clients":null},
{"id":2,"clients":null},
{"id":3,"clients":{"id":2}},
{"id":4,"clients":{"id":2}},
{"id":5,"clients":null}]

This PR will only cover tables, views will need more work and smartness from our schema cache; mainly because when doing GROUP BY, pg is able to infer the pk on a table but not a view, more details here.

Edit: Thanks to Iced-Sun's query below(#1949 (comment)), this will now work for both tables and views.

~~This behavior can also be enabled by default with the following config option~~

~~db-embed-default-join="inner"~~

Which saves the need for specifying !inner on every request. In this case, you can go back to the previous behavior per request by specifying !left on the embedded resource, e.g /projects?select=*,clients!left(*)&clients.id=eq.12

Edit: The db-embed-default-join config was removed in #2034

Steps

This PR won't cause a breaking change. Embedding will still use subqueries by default, only on !inner it will use LATERAL.

Hints used like `select=projects.client_id(*)` were already deprecated. '!' should be used from now on `select=projects!client_id(*)`

wolfgangwalther · 2021-09-16T13:12:27Z

Sorry for not bringing this up earlier in the issue, but looking at the example ...

GET /projects?select=id,clients!inner(id)&clients.id=eq.2

... it suddenly strikes me as odd to have something in the select= part change the number of rows. Imho, the select= part should only change the shape of the output / each row - but should not filter anything.

Since we already have embedded filters as <alias>.<column>= - how about extending that to support a special operator on only <alias>=?

Maybe something like this could make it an inner join:

GET /projects?select=id,clients(id)&clients.id=eq.2&clients=inner

steve-chavez · 2021-09-16T18:11:51Z

... it suddenly strikes me as odd to have something in the select= part change the number of rows. Imho, the select= part should only change the shape of the output / each row - but should not filter anything.

@wolfgangwalther True. I was mostly going from what we discussed in #915 (comment). Mainly that the embedding was a function and that the syntax for parameters would be func!param1!param2 so embed!join_type!hint(..).

Since we already have embedded filters as .= - how about extending that to support a special operator on only =?

I like the direction. Maybe we could have have something like:

GET /projects?select=id,clients(id)&clients.id=eq.2&clients=join.inner

In fact, the hints could also be expressed on the query param:

GET /projects?select=id,clients(id)&clients.id=eq.2&clients=join(inner).hint1,hint2

(The join(inner)) syntax would be similar to our full text search with the lang)

That keeps the select param cleaner. It could also provide a path for doing the future function syntax:

GET ?select=avg&avg=args.amount

steve-chavez · 2021-09-16T20:54:26Z

GET /projects?select=id,clients(id)&clients.id=eq.2&clients=join(inner).hint1,hint2

I guess one drawback of that syntax is that it's more verbose(both for the REST interface and libraries probably).

wolfgangwalther · 2021-09-20T12:16:38Z

Mainly that the embedding was a function and that the syntax for parameters would be func!param1!param2 so embed!join_type!hint(..).

I see. The "embedding as a function" pov only holds true for the SELECT part, though - at least if we think of "function" as a "PostgreSQL FUNCTION". This is because we can't replicate an inner join with a computed column function.

I like the direction. Maybe we could have have something like:
GET /projects?select=id,clients(id)&clients.id=eq.2&clients=join.inner

Hm, I think it's enough to specify the inner only. That's assuming join is the only operator anyway. I can imagine we could support inner and anti (for an ANTI-JOIN) down the road. Not sure about whether right and/or full make any sense. But I don't see anything other than join, so we might as well omit it.

In fact, the hints could also be expressed on the query param:
GET /projects?select=id,clients(id)&clients.id=eq.2&clients=join(inner).hint1,hint2
(The join(inner)) syntax would be similar to our full text search with the lang)

That keeps the select param cleaner. It could also provide a path for doing the future function syntax:
GET ?select=avg&avg=args.amount

Assuming we'd keep the ability to put hints (and therefore the embedding specification) in only one place (either query param or select), this would be a breaking change.

Once we're at that stage, where we implement such a big breaking change to the query syntax, I'd have a few other points to raise to change in the query syntax. Two biggest things that come to mind are correct quoting / escaping and moving the operator to the left of the equal sign like e.g. column.eq=value. This would make things a fair bit easier for plain-html-forms, to support no javascript environments. Also the column+operator used tends to be quite static, while the value used for the query is not - separating the static part from the dynamic part via = (which is supported by all the client librarys for query params) seems the logical choice.

So, I suggest to just go with the non-breaking change for this PR - and then open another issue where we can discuss a "new query syntax" that could get a few things straight. We could implement both at the same time and have them configurable via config option, to be able to slowly deprecate the old syntax.

steve-chavez · 2021-09-20T22:17:48Z

Hm, I think it's enough to specify the inner only. That's assuming join is the only operator anyway. I can imagine we could support inner and anti (for an ANTI-JOIN) down the road. But I don't see anything other than join, so we might as well omit it.

It would be too inconsistent at the parsers level, I'm not even sure if it can be done cleanly. All of our filters have a prefix., with the sole exception of function args, which where originally meant to be called with the arg. prefix(arg1=arg.value) but that was decided against because of verbosity. This is done by checking if the rpc route is being used(all filters without prefix are args).

and moving the operator to the left of the equal sign like e.g. column.eq=value. This would make things a fair bit easier for plain-html-forms, to support no javascript environments.

Hm, if this is meant for the eq filter, we could have a similar rule as the one mentioned above regarding the function arg: for table routes, default to eq for filters with no prefix.(so id=1 would work). This has been asked before on an issue IIRC, also on reddit. There would be no big breaking change for this.

Assuming we'd keep the ability to put hints (and therefore the embedding specification) in only one place (either query param or select), this would be a breaking change.

I was thinking to have both actually, so no breaking change. I'm still not set on either though. For now I'll continue with the select=..!inner, doing the queries is the hardest part about this PR.

wolfgangwalther · 2021-09-21T12:10:46Z

Hm, if this is meant for the eq filter,

No, I propose to have all filters on the left side. id.lt=5, id.gt=5, ...

I was thinking to have both actually, so no breaking change.

Imho, that would make it quite complicated to parse. You could have conflicting targets/hints/... in both parts - this will be very hard to efficiently create a good query.

Iced-Sun · 2021-09-21T17:16:19Z

The LATERAL/GROUP BY approach for resource embedding with inner join depends on the fact that pg could group by pk even when the group keys include other columns (group by pk, c1, c2, c3 => group by pk). But the inference doesn't happen for a view, nor a not null unique key, hence a potentially expensive full group-by must be applied.

I propose to implement resource embedding with correlated join, which is essentially the same as the current approach of correlated subquery, but with the extensibility to support left/inner/anti-joins.

Specifically, instead of translating the left-join resource embedding of GET /projects?select=id,clients(id)&clients.id=eq.2 to

with pg_source as (
  select projects.id, coalesce(
           (select json_agg(clients.*)
              from (select clients.id 
                      from clients
                     where clients.project_id = projects.id and clients.id = 2
              ) clients), 
         '[]') as clients
    from projects
)
select coalesce(json_agg(_postgrest_t), '[]')::character varying as body
from (select * from pg_source) _postgrest_t;

a correlated join version could be

with pg_source as (
     select projects.id, coalesce(nullif(clients_clients.clients::text, '[null]'), '[]')::json AS clients
       from projects
  left join lateral (select json_agg(clients) clients 
                       from (select clients.id 
                               from clients
                              where clients.project_id = projects.id and clients.id = 2
                            ) clients
                    ) clients_clients
         on clients_clients.clients is not null
)
select coalesce(json_agg(_postgrest_t), '[]')::character varying as body
from (select * from pg_source) _postgrest_t;

An inner-join resource embedding is then

with pg_source as (
     select projects.id, coalesce(nullif(clients_clients.clients::text, '[null]'), '[]')::json AS clients
       from projects
       join lateral (select json_agg(clients) clients 
                       from (select clients.id 
                               from clients
                              where clients.project_id = projects.id and clients.id = 2
                            ) clients
                    ) clients_clients
         on clients_clients.clients is not null
)
select coalesce(json_agg(_postgrest_t), '[]')::character varying as body
from (select * from pg_source) _postgrest_t;

And an anti-join resource embedding is

with pg_source as (
     select projects.id, coalesce(nullif(clients_clients.clients::text, '[null]'), '[]')::json AS clients
       from projects
       join lateral (select json_agg(clients) clients 
                       from (select clients.id 
                               from clients
                              where clients.project_id = projects.id and clients.id = 2
                            ) clients
                    ) clients_clients
         on clients_clients.clients is null
)
select coalesce(json_agg(_postgrest_t), '[]')::character varying as body
from (select * from pg_source) _postgrest_t;

The correlated-join approach doesn't rely on pk inference, works on view and not null unique key. And the performance is essentially the same as subquery (#1075 (comment)).

Does that make any sense?

Iced-Sun · 2021-09-21T17:28:47Z

BTW, the ability to do aggregation and flatten join seems to be a popular requested feature (#1233, #211, #1126, #915 and others). Do we have any plan to support explicit join/group by?

Say, something like

GET /projects,projects_traits?select=projects.id,projects_traits.category
GET /projects?select=count(*)&groupby=zone
GET /projects,project_traits?select=count(*),projects_traits.category&groupby=projects_traits.category

Look like it will take a huge effort.

steve-chavez · 2021-09-23T05:23:20Z

@Iced-Sun Awesome! Your approach is much better since it doesn't rely on GROUP BY and it also works on views 🎉 🎉

So to understand it better, taking my approach on #1075 (comment), the query for inner joins would now be

WITH pg_source AS (
   SELECT "test"."clients"."id",
          "test"."clients"."name",
          "projects_projects"."projects" AS "projects"
   FROM "test"."clients"
   INNER JOIN LATERAL (
     SELECT json_agg("projects") "projects" -- important change here
     FROM(
       SELECT "test"."projects"."id", "test"."projects"."name"
       FROM "test"."projects"
       WHERE "test"."projects"."client_id" = "test"."clients"."id") "projects"
   ) AS "projects_projects" ON "projects_projects"."projects" IS NOT NULL -- important change here
)
SELECT coalesce(json_agg(_postgrest_t), '[]')::character varying AS BODY
FROM (SELECT * FROM pg_source) _postgrest_t;

Where the main changes are the correlated join(which avoids the need for GROUP BY) plus the LATERAL ... ON <target> IS NOT NULL.

As you pointed out, this practically has the same cost that our correlated subquery has(cost=2229.33..2229.34). The correlated join cost(2261.08..2261.09) is higher than the GROUP BY(125.21..125.22) query but of course it doesn't have any of the downsides - which is definitely good enough.

(The above is just my simplistic view, I think you've determined that GROUP BY is actually worse in perf)

So I'll proceed with the correlated join approach for this feature.

Do we have any plan to support explicit join/group by?

Yes, but besides the syntax, we still need a way to protect against a potentially expensive group by, this was discussed on #915.

Iced-Sun · 2021-09-23T09:45:23Z

As you pointed out, this practically has the same cost that our correlated subquery has(cost=2229.33..2229.34). The correlated join cost(2261.08..2261.09) is higher than the GROUP BY(125.21..125.22) query but of course it doesn't have any of the downsides - which is definitely good enough.

(The above is just my simplistic view, I think you've determined that GROUP BY is actually worse in perf)

I did some simple performance tests in #1075 (comment). It suggests that the GROUP BY approach is much more performent in the absence of a proper index; while with create index on project (client_id), the correlated join (and correlated subquery) overtakes marginally (but not so marginal for a small result set). So I think group by may be preferable if we didn't have the pk problem to consider because it is more stable (on perf).

I have no solid facts on grouping by full-size keys (in the case that group-by-pk is not applicable), but I do have some not-so-good experiences when I have to group by multiple text columns to work around bad schema design.

As you said, correlated join doesn't break things or bring surprises, and it unifies the left/inner/anti-join resource embedding.

Thanks for the great piece of software.

steve-chavez · 2021-09-28T02:33:24Z

.. it suddenly strikes me as odd to have something in the select= part change the number of rows. Imho, the select= part should only change the shape of the output / each row - but should not filter anything.

@wolfgangwalther Been thinking about this, why is it "odd" to change the shape of the output on select? Because when using count(and we'd have more aggregates later) this happens:

GET /projects?select=count

[{"count":5}]

And of course in SQL the same thing happens - not only WHERE modifies the rows, SELECT also does.

wolfgangwalther · 2021-09-28T09:28:50Z

Been thinking about this, why is it "odd" to change the shape of the output on select?
[...]
[...] in SQL the same thing happens - not only WHERE modifies the rows, SELECT also does.

That's odd for me, already. For me, using an aggregate function without a GROUP BY basically adds an implicit GROUP BY with a constant value. I would have liked that to be more explicit in SQL, too. But that ship has sailed for a long time, I guess.

steve-chavez · 2021-09-28T16:06:57Z

That's odd for me, already. For me, using an aggregate function without a GROUP BY basically adds an implicit GROUP BY with a constant value

@wolfgangwalther I see. I've also always find it odd that these queries work

select p.json_agg from test.projects p;
select p.count from projects p;
select p.array_agg from projects p;
-- same through postgrest

(I think we discussed this somewhere before)

Edit: Also, using a different !hint in select can change the number of rows.

I would have liked that to be more explicit in SQL, too. But that ship has sailed for a long time, I guess.

Yeah. So considering this, WDYT about just going with the select=*,embed!inner(*) syntax? Considering select is not consistent, having an embed=join.inner syntax only adds more verbosity.

steve-chavez · 2021-09-29T23:29:49Z

Alright, this is now ready for a final review. I've added tests for views(with m2o/o2m/m2m cases), rpc and embedding with hints.

I've also added a db-embed-default-join config option(mentioned above #1949 (comment)) for users that want this join type by default.

CHANGELOG.md

test/io-tests/configs/expected/no-defaults-with-db-other-authenticator.config

test/io-tests/configs/expected/no-defaults-with-db.config

test/io-tests/configs/expected/no-defaults.config

src/PostgREST/Request/Types.hs

CHANGELOG.md

This is enabled by adding `!inner` to the embedded resource /projects?select=*,clients!inner(*)&clients.id=eq.12 This behaviour can be enabled by default with the config option db-embed-default-join='inner' Which saves the need for specifying `!inner` on every request. If this is enabled, the previous behavior can be restored per request by specifying `!left` on the embedded resource. /projects?select=*,clients!left(*)&clients.id=eq.12` Tested on M20/02M/M2M relationships, views, RPC.

colearendt · 2021-10-13T08:44:02Z

How does this work with joins that are already using ! for the foreign key in use? Is it possible to specify a foreign key and an inner join?

i.e. if I have this:

select=*,alias:clients!fk_main_clients(*)

How should it switch to an inner join short of changing the default? Is this in a test case?

colearendt · 2021-10-13T08:45:37Z

test/fixtures/schema.sql

+  supplier_id int references suppliers(id),
+  trade_union_id int references trade_unions(id),
+  primary key (supplier_id, trade_union_id)
+);


Missing a newline, FYI. Some editors struggle with opening such files, or so I have heard

steve-chavez · 2021-10-13T17:40:02Z

How should it switch to an inner join short of changing the default? Is this in a test case?

@colearendt Yes, this should work like

postgrest/test/Feature/EmbedInnerJoinSpec.hs

Lines 215 to 218 in 627c3c3

    
           it "works when using hints" $ do 
        
             get "/projects?select=id,clients!client!inner(id)&clients.id=eq.2" `shouldRespondWith` 
        
               [json| [{"id":3,"clients":{"id":2}}, {"id":4,"clients":{"id":2}}] |] 
        
               { matchHeaders = [matchContentTypeJson] }

colearendt · 2021-10-14T02:36:36Z

Thanks! That's perfect! I totally missed that example 🙈 Found a little bug in #1977, but this is really awesome!! Well done!

Drop support for embed hints used as '.'

1593b0c

Hints used like `select=projects.client_id(*)` were already deprecated. '!' should be used from now on `select=projects!client_id(*)`

wolfgangwalther mentioned this pull request Sep 20, 2021

Resource Embedding - Enable filtering on a parent table of an embedded child tables properties #1954

Closed

steve-chavez changed the title ~~Support embedding with inner join(tables only)~~ Support embedding with inner join Sep 24, 2021

steve-chavez force-pushed the inner-join branch 3 times, most recently from 1adedef to b1a42c0 Compare September 29, 2021 23:23

steve-chavez changed the title ~~Support embedding with inner join~~ Support filtering top-level resource based on embedded resource filter Sep 29, 2021

steve-chavez marked this pull request as ready for review September 29, 2021 23:27

wolfgangwalther reviewed Oct 4, 2021

View reviewed changes

soedirgo mentioned this pull request Oct 4, 2021

Idea: parse select queries to give better types supabase/postgrest-js#217

Closed

wolfgangwalther approved these changes Oct 4, 2021

View reviewed changes

steve-chavez force-pushed the inner-join branch from 9d9892c to 41ef1cb Compare October 4, 2021 18:22

steve-chavez merged commit ee56dd5 into PostgREST:main Oct 4, 2021

This was referenced Oct 4, 2021

filtering on main table with embedded table criteria(inner join) #1075

Closed

Filter source table based on the embedded table(inner join) supabase/postgrest-js#197

Closed

wolfgangwalther mentioned this pull request Oct 5, 2021

Swagger: GET record from table with uuid primary key: failed to parse filter #1970

Open

colearendt reviewed Oct 13, 2021

View reviewed changes

steve-chavez mentioned this pull request Oct 14, 2021

new inner join cannot handle more than one one-to-many matches #1977

Closed

laurenceisla mentioned this pull request Oct 15, 2021

Allow top-level resource with embed filter PostgREST/postgrest-docs#442

Merged

wolfgangwalther mentioned this pull request Nov 6, 2021

Partitioned tables are not supported by everything related to schema cache #1783

Closed

steve-chavez mentioned this pull request Nov 29, 2021

Inaccurate total record count returned when top-level filtering with embedded resource filters #2009

Closed

Iced-Sun mentioned this pull request Dec 4, 2021

Order parent by child's column (for to-one mapping) #1414

Closed

wolfgangwalther mentioned this pull request Feb 10, 2022

feat: max-changes prefer header to limit mutations #2164

Closed

5 tasks

steve-chavez mentioned this pull request Aug 6, 2022

refactor: correlated subquery for o2m query #2409

Merged

steve-chavez mentioned this pull request Aug 15, 2022

When embedding with top-level filtering (inner join), empty parentheses should be allowed #2340

Closed

steve-chavez mentioned this pull request Dec 12, 2022

feat: null filters on embedded resources #2584

Merged

steve-chavez mentioned this pull request Sep 19, 2023

fix: Bug when Null Filtering on embedded resources #2951

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support filtering top-level resource based on embedded resource filter #1949

Support filtering top-level resource based on embedded resource filter #1949

steve-chavez commented Sep 15, 2021 •

edited

Loading

wolfgangwalther commented Sep 16, 2021

steve-chavez commented Sep 16, 2021 •

edited

Loading

steve-chavez commented Sep 16, 2021

wolfgangwalther commented Sep 20, 2021

steve-chavez commented Sep 20, 2021

wolfgangwalther commented Sep 21, 2021

Iced-Sun commented Sep 21, 2021 •

edited

Loading

Iced-Sun commented Sep 21, 2021

steve-chavez commented Sep 23, 2021 •

edited

Loading

Iced-Sun commented Sep 23, 2021

steve-chavez commented Sep 28, 2021

wolfgangwalther commented Sep 28, 2021

steve-chavez commented Sep 28, 2021 •

edited

Loading

steve-chavez commented Sep 29, 2021

colearendt commented Oct 13, 2021 •

edited

Loading

colearendt Oct 13, 2021 •

edited

Loading

steve-chavez commented Oct 13, 2021

colearendt commented Oct 14, 2021

Support filtering top-level resource based on embedded resource filter #1949

Support filtering top-level resource based on embedded resource filter #1949

Conversation

steve-chavez commented Sep 15, 2021 • edited Loading

Steps

wolfgangwalther commented Sep 16, 2021

steve-chavez commented Sep 16, 2021 • edited Loading

steve-chavez commented Sep 16, 2021

wolfgangwalther commented Sep 20, 2021

steve-chavez commented Sep 20, 2021

wolfgangwalther commented Sep 21, 2021

Iced-Sun commented Sep 21, 2021 • edited Loading

Iced-Sun commented Sep 21, 2021

steve-chavez commented Sep 23, 2021 • edited Loading

Iced-Sun commented Sep 23, 2021

steve-chavez commented Sep 28, 2021

wolfgangwalther commented Sep 28, 2021

steve-chavez commented Sep 28, 2021 • edited Loading

steve-chavez commented Sep 29, 2021

colearendt commented Oct 13, 2021 • edited Loading

colearendt Oct 13, 2021 • edited Loading

Choose a reason for hiding this comment

steve-chavez commented Oct 13, 2021

colearendt commented Oct 14, 2021

steve-chavez commented Sep 15, 2021 •

edited

Loading

steve-chavez commented Sep 16, 2021 •

edited

Loading

Iced-Sun commented Sep 21, 2021 •

edited

Loading

steve-chavez commented Sep 23, 2021 •

edited

Loading

steve-chavez commented Sep 28, 2021 •

edited

Loading

colearendt commented Oct 13, 2021 •

edited

Loading

colearendt Oct 13, 2021 •

edited

Loading