You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tantivy search API requires me to set some limit of search results.
At that time, I have no idea which of these (or how many) results can be viewed by the user making the request.
So what I currently do, is I filter all resources from the results list.
This means that the list can be less than limit amount of items.
This can lead to a confused user. Imagine searching for document, finding 3 results, but you know there are more. Where are the others?
So how can I deal with this?
Default to a higher limit
Easy, but makes things slower, and this will not work when there are many users + many private resources.
We could still limit the amount of items to return and respect the client's params here.
That would also help keep performance acceptable.
Let the front-end deal with this
The client could request some amount, and if it fails, try again with a larger amount.
Very slow, very ugly.
Use tantivy::TopDoc::and_offset
I think we can use this to perform a new query again (that skips n items from the first one) if we haven't got enough authorized resources yet.
The text was updated successfully, but these errors were encountered:
was wondering whether Tantivy can return an Iterator as an alternative to setting a Limit when creating TopDocs. Basically TopDocs::without_limit, or something. Should I use TopDocs::and_offset instead?
In my usecase, I want to authorize the user before sending the result. This means I don't know in advance if my Limit is big enough. Having an Iterator would fix this problem
Got this reply:
TopDocs is really efficient when you put a hard limit, like 10, 20, 30 docs. Let's call the number of docs you want to retrieve N and the offset offset. During collection of docs at the segment level, we have to retain at most N + offset and we store them in a heap. Once the heap is full, we can skip document with a score lower than the lowest score of docs in the heap. We can even use block-max WAND to skip blocks of documents to be faster (block max wand is for terms queries only).
From what I understand, you want to retrieve possibly all the docs, so I would implement a dedicated collector. BTW there is a very simple collector that returns the set of DocAddress that matches the query: DocSetCollector. It's not ordered by score though.
The tantivy search API requires me to set some
limit
of search results.At that time, I have no idea which of these (or how many) results can be viewed by the user making the request.
So what I currently do, is I filter all resources from the results list.
This means that the list can be less than
limit
amount of items.This can lead to a confused user. Imagine searching for
document
, finding 3 results, but you know there are more. Where are the others?So how can I deal with this?
Default to a higher limit
Easy, but makes things slower, and this will not work when there are many users + many private resources.
We could still limit the amount of items to return and respect the client's params here.
That would also help keep performance acceptable.
Let the front-end deal with this
The client could request some amount, and if it fails, try again with a larger amount.
Very slow, very ugly.
Use
tantivy::TopDoc::and_offset
I think we can use this to perform a new query again (that skips
n
items from the first one) if we haven't got enough authorized resources yet.The text was updated successfully, but these errors were encountered: