Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for ordering by distance (using <-> operator) #2989

Open
mkleczek opened this issue Oct 4, 2023 · 4 comments
Open

Support for ordering by distance (using <-> operator) #2989

mkleczek opened this issue Oct 4, 2023 · 4 comments
Labels
idea Needs of discussion to become an enhancement, not ready for implementation

Comments

@mkleczek
Copy link
Contributor

mkleczek commented Oct 4, 2023

I have an interesting case: we have a huge (one partitioned table with 10G rows) database that needs to be searched effectively using various criteria. It appears a single GIST index would meet all our demands as it is multi-column and supports multiple datatypes in a single index.
The problem is that GIST only supports ordering by distance: ORDER BY column <-> [value].
It is not a big deal in SQL as for btree_gist operations we can change from ORDER BY column to ORDER BY column <-> [very_small_value]
It would be great if we could use it from PostgREST without resorting to custom functions.

I guess it would also be very nice addition for all full text and vector searching needs (as most of the time you don't want to filter but rather to sort by rank).

@steve-chavez steve-chavez added the idea Needs of discussion to become an enhancement, not ready for implementation label Oct 4, 2023
@wolfgangwalther
Copy link
Member

Syntax could maybe be something like this:

?order=column.asc&order.column=dist.value

@steve-chavez
Copy link
Member

steve-chavez commented Dec 6, 2023

It's getting too hard to extend the URL query grammar. Another case of a common query that uses the order by distance is this one for vector search:

  select
    id,
    content,
    metadata,
    1 - (documents.embedding <=> query_embedding) as similarity
  from documents
  where metadata @> filter
  order by documents.embedding <=> query_embedding;

(from https://python.langchain.com/docs/integrations/vectorstores/supabase)

Instead of trying to shove these special cases into query params, I'm thinking we should take advantage of content types. Something like:

SEARCH /table
Content-Type: application/vnd.pgrst.vector-search; distance-order;col1;col2; query=xx; filter=yy
Accept: application/json

{query: "..", filter: ".."}

Then that would then execute that common query.

(Another case that also fits here is the order by random PostgREST/postgrest-docs#671)

If we can make this play nicely with #2826 then that would be ideal.

I also think it'd be much clearer for users to have properly named content-types with parameters than to use more magic syntax in the URL grammar.

@mkleczek
Copy link
Contributor Author

mkleczek commented Dec 8, 2023

I am not really sure if ordering by distance specifically is not too limiting. In SQL it is more general ORDER BY [expr] DESC/ASC.

It is just that distance operator is most commonly used and supported by index access methods.

In my case adding ORDER BY col <-> [constant] is just a workaround for lack of ORDER BY column support in GIST access method.

I would actually envision syntax resembling horizontal filtering: ?order=[column].[operator].[argument] where argument might be required to be a constant value.

@wolfgangwalther
Copy link
Member

Instead of trying to shove these special cases into query params, I'm thinking we should take advantage of content types. Something like:

In #2066 (comment) I argued, that we should:

Move select and order to parameters of the mimetype in the Accept header. Both parameters only shape the response - but the underlying entity is the same.

I'm not sure anymore whether that's really the case. order does not only shape the response differently.. it also affects which rows are selected, if you combine it with limit. So it does change the entity that is returned.

So... I don't think this is a good idea after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea Needs of discussion to become an enhancement, not ready for implementation
Development

No branches or pull requests

3 participants