-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filtering on main table with embedded table criteria(inner join) #1075
Comments
This has been requested many times(check also #994 (comment) and this stackoverflow question) and all current workarounds seem lacking to me. The feature seems doable, basically the current queries need to use explicit JOINs instead of subselects, for example: -- GET "/clients?select=id,name,projects(id,name)"
-- currently generates this query
-- cost=21434.90..21434.92
WITH pg_source AS (
SELECT "test"."clients"."id",
"test"."clients"."name",
COALESCE((
SELECT json_agg("projects".*)
FROM (
SELECT "test"."projects"."id", "test"."projects"."name"
FROM "test"."projects"
WHERE "test"."projects"."client_id" = "test"."clients"."id") "projects"), '[]') AS "projects"
FROM "test"."clients")
SELECT coalesce(json_agg(_postgrest_t), '[]')::character varying AS BODY
FROM (SELECT * FROM pg_source) _postgrest_t;
-- can be changed to this new query
-- cost=267.04..267.06
WITH pg_source AS (
SELECT "test"."clients"."id",
"test"."clients"."name",
coalesce(nullif(json_agg("projects_projects")::text, '[null]'), '[]')::json AS "projects"
FROM "test"."clients"
LEFT JOIN LATERAL (
SELECT "test"."projects"."id", "test"."projects"."name"
FROM "test"."projects"
WHERE "test"."projects"."client_id" = "test"."clients"."id"
) AS "projects_projects" ON TRUE
GROUP BY "test"."clients"."id"
)
SELECT coalesce(json_agg(_postgrest_t), '[]')::character varying AS BODY
FROM (SELECT * FROM pg_source) _postgrest_t; Some things to notice:
The many to many embeds work with an analogous query that includes a Once that it's done we can parametrize the GET /clients?select=id,name,projects+inner(id,name)
GET /clients?select=id,name,projects+left(id,name) -- this is the default Hopefully we can use |
how about 3-4 levels deep, and 3-4 child types at the same time |
@ruslantalpa Here's a more complex example that includes an M2M relationship. This request(tasks-users is M2M): curl "localhost:3000/clients" -G \
-d select="id,name,projects(id,name,tasks(id,name,users(id,name)))" Would generate: WITH pg_source AS (
SELECT
"test"."clients"."id",
"test"."clients"."name",
coalesce(nullif(json_agg("projects_projects")::text, '[null]'), '[]')::json AS "projects"
FROM "test"."clients"
LEFT JOIN LATERAL (
SELECT
"test"."projects"."id",
"test"."projects"."name",
coalesce(nullif(json_agg("tasks_tasks")::text, '[null]'), '[]')::json AS "tasks"
FROM "test"."projects"
LEFT JOIN LATERAL (
SELECT
"test"."tasks"."id",
"test"."tasks"."name",
coalesce(nullif(json_agg("users_users")::text, '[null]'), '[]')::json AS "users"
FROM "test"."tasks"
LEFT JOIN LATERAL (
SELECT
"test"."users"."id",
"test"."users"."name"
FROM
"test"."users_tasks",
"test"."users"
WHERE
"test"."users"."id" = "test"."users_tasks"."user_id" AND
"test"."tasks"."id" = "test"."users_tasks"."task_id"
) AS "users_users" ON TRUE
WHERE
"test"."tasks"."project_id" = "test"."projects"."id"
GROUP BY "test"."tasks"."id"
) AS "tasks_tasks" ON TRUE
WHERE "test"."projects"."client_id" = "test"."clients"."id"
GROUP BY "test"."projects"."id"
) AS "projects_projects" ON TRUE
GROUP BY "test"."clients"."id"
)
SELECT coalesce(json_agg(_postgrest_t), '[]')::character varying AS BODY
FROM (SELECT * FROM pg_source) _postgrest_t; |
One problem with the GROUP BY. When the query references a table, we can just GROUP BY the PKs of the table for the aggregate( |
Main idea for generating the above query would be to use the |
3 problems with this direction
|
@ruslantalpa Regarding 2, I think this would greatly simplify the code that does the JOINs and the resulting query is easier to understand. About 1, here #1075 (comment) I mentioned that the LATERAL JOIN query plan cost was much lower than the SUBSELECT one. It's likely that were making the query faster. Still, more tests could be useful. |
New queries with LATERAL instead of SUBSELECT are much faster on big workloads, here are my results using pgbench and the same idea as in #978 (comment): https://gist.github.com/steve-chavez/f79b5c3e777a435d024d44cebb8ac8f4.
|
how about selecting 1k-10k rows? |
@ruslantalpa I'd really appreciate your help with that. It'd be better if you could replicate my results to be extra sure the queries are fast. |
Hey @steve-chavez and @ruslantalpa , what is the status on that enhancement? I actually don't see any workaround when the filtering can be done on several embedded fields with various operators. There might be a quick fix if you allow this kind of syntax : Simple syntax, the SQL postgREST will generate will most likely not be efficient, but that's better than nothing ! |
@Remi-C The problem right now is that PostgREST only does LEFT JOINs when embedding childs, so the parents always remain in the result, syntax wise there isn't an issue. There is a workaround with computed columns though, see PostgREST/postgrest-docs#211 (comment). Edit: Copy/pasted the wrong link. |
@steve-chavez thanks for the answer. Computed columns/views are not going to work with 4 levels of nesting, and filtering happening on various columns with various operators. So far the only workaround with realistic performances I can come up with is to go from my nice relational model with dozen of tables to a one table document oriented model. That is having only one big table with a json type and abuse the jsonb indexing capabilites of postgres. You know postgres very well, you can understand why this backup plan feels wrong. I'm assuming there is no way to inject some of my own SQL into postgREST generated queries right? |
Per the comment of #1075 and the comment of #1018, the to-1 relationship may be a simlper but also demanded case. Taking the example in comment, would it be possible to generate the following query: WITH pg_source AS (
SELECT "test"."clients"."id",
"test"."clients"."name",
COALESCE((
SELECT json_agg("projects".*)
FROM (
SELECT "test"."projects"."id", "test"."projects"."name"
FROM "test"."projects"
WHERE "test"."projects"."client_id" = "test"."clients"."id") "projects"), '[]') AS "projects"
FROM "test"."clients")
SELECT coalesce(json_agg(_postgrest_t), '[]')::character varying AS BODY
FROM (SELECT * FROM pg_source) _postgrest_t
-- add this line
WHERE projects IS NOT NULL; Is it possible? Seems that it doesn't change much. |
Further investigation on subquery vs. joinThe new-*-query in https://gist.github.com/steve-chavez/f79b5c3e777a435d024d44cebb8ac8f4 uses SELECT items.*, subitems_subitems.*
FROM items
LEFT JOIN LATERAL (
SELECT id, name FROM subitems WHERE subitems.item_id = items.id
) AS subitems_subitems ON TRUE is basically equivalent to SELECT items.*, subitems.id, subitems.name
FROM items
LEFT JOIN subitems ON subitems.item_id = items.id That leaves more opportunity for postgres to evaluate execution plans, but the vanilla Another approach is to leverage the correlated join, which resembles the correlated subquery approach (but with the correlated parts in the range table instead of in the target list). The child query is: WITH pg_source AS (
SELECT
"new_m2m_child_queries"."items"."id",
"new_m2m_child_queries"."items"."name",
"new_m2m_child_queries"."items"."c1",
"new_m2m_child_queries"."items"."c2",
"new_m2m_child_queries"."items"."c3",
"new_m2m_child_queries"."items"."c4",
coalesce(nullif(subitems::text, '[null]'), '[]')::json AS "subitems"
FROM "new_m2m_child_queries"."items"
LEFT JOIN LATERAL (
SELECT json_agg(_) subitems FROM (
SELECT
"new_m2m_child_queries"."subitems"."id",
"new_m2m_child_queries"."subitems"."name"
FROM "new_m2m_child_queries"."subitems"
WHERE
"new_m2m_child_queries"."subitems"."item_id" = "new_m2m_child_queries"."items"."id") _
) AS "subitems_subitems" ON TRUE
WHERE
"new_m2m_child_queries"."items"."id" > :rid AND
"new_m2m_child_queries"."items"."id" <= (:rid + 100)
)
SELECT coalesce(json_agg(_postgrest_t), '[]')::character varying AS BODY
FROM (SELECT * FROM pg_source) _postgrest_t; while the m2m-query is: WITH pg_source AS (
SELECT
"new_m2m_child_queries"."subitems"."id",
"new_m2m_child_queries"."subitems"."name",
"new_m2m_child_queries"."subitems"."c1",
"new_m2m_child_queries"."subitems"."c2",
"new_m2m_child_queries"."subitems"."c3",
"new_m2m_child_queries"."subitems"."c4",
"pieces"
FROM "new_m2m_child_queries"."subitems"
LEFT JOIN LATERAL (
SELECT json_agg(_) as pieces from (
SELECT
"new_m2m_child_queries"."pieces"."id",
"new_m2m_child_queries"."pieces"."name"
FROM
"new_m2m_child_queries"."subitems_pieces",
"new_m2m_child_queries"."pieces"
WHERE
"new_m2m_child_queries"."subitems"."id" = "new_m2m_child_queries"."subitems_pieces"."subitem_id" AND
"new_m2m_child_queries"."pieces"."id" = "new_m2m_child_queries"."subitems_pieces"."piece_id"
)_) AS "pieces_pieces" ON TRUE
WHERE
"new_m2m_child_queries"."subitems"."id" > :rid AND
"new_m2m_child_queries"."subitems"."id" <= (:rid + 100)
)
SELECT coalesce(json_agg(_postgrest_t), '[]')::character varying AS BODY
FROM (SELECT * FROM pg_source) _postgrest_t; These queries force Postgres to do the correlated join, hence the aggregation is performed per loop instead of an overal Here are some benchmarks. child query (TPS)Index:
m2m query (TPS)Index:
It seems to me that |
Do we have any further update on this matter? |
As mentioned by @sunbing81 here, for to-1 relations it would be sufficient to be able to filter whether an embedded row is Example use case:
Only If you use the foreign key column to configure embedding (e.g. |
Perhaps other types of filters could be useful if we add support for flattening: #1233 (comment) So what we need is a way to filter the top level embeds. The query that @sunbing81 proposed can work if we remove the coalesce in WITH pg_source AS (
SELECT "test"."clients"."id",
"test"."clients"."name",
(
SELECT json_agg("projects".*)
FROM (
SELECT "test"."projects"."id", "test"."projects"."name"
FROM "test"."projects"
WHERE "test"."projects"."client_id" = "test"."clients"."id") "projects") AS "projects"
FROM "test"."clients")
SELECT coalesce(json_agg(_postgrest_t), '[]')::character varying AS BODY
FROM (SELECT * FROM pg_source) _postgrest_t
-- add this line
WHERE projects IS NOT NULL
-- otherwise we'd have to do something like
-- WHERE json_array_length(projects) > 0; This approach certainly looks easier than implementing the INNER JOIN. Haven't thought all the details though. Edit: The problem with this approach is that it's not composable for recursive embeds. The subquery must be in an upper level to use it in a WHERE clause(see here). I guess we could create many artificial levels but I'm sure that would heavily impact the perf of the query. |
Did you ever solve this? |
@volkandkaya no 😕 |
@steve-chavez so as I can see so far, you planning to filter main table by "sweeping" from it null-ed embeddable resource. Am I right? Right now I want to create a simple server, that would get postgrest-format requests from client, ask Postgrest server, sweep results with null embeds and give response back to client. All I want to ask - will you bring "sweeping" logic right now to existed Postgrest queries? Thank you very much. |
@drmnk IIUYC, the second one, it will be possible to keep null embeddings or discard them through the url: Many users are asking for filtering nulls by default, so maybe there will be a config option to do it like that( |
oh, great, config option is must-have! |
Any update or timing expectation on this one? I just hit this issue for performance reasons (too much data in the original table) and the possible workarounds are much less than ideal (build a stored procedure). I don't know haskell, but if there is anything that ignorant not-yet-contributors can do to multiply or accelerate efforts (help with test coverage, use a prototype build for feedback, help with docs, etc.), then I would be happy to help!! (I imagine others would be as well 😄 ) |
Awesome news @steve-chavez, congrats! Can't wait to use this in Supabase, we've got several scenarios where this is going to simplify things A LOT. Amazing work, huge thanks!! 🎉🎉🎉 |
Just want to pop in and say this is working great in my testing! Each time I have a new use case for this feature and I implement it, it makes me smile 😄 Thanks so much for the awesome work here! |
Per issues like #696 and #1067 it seems there's no way to apply all filters of embedded tables to the main table -- though there would be a way to do that via SQL.
Thanks for any advice!
The text was updated successfully, but these errors were encountered: