-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pagination performance does not meet requirements #352
Comments
Probably resolvable via #353 |
Yes you are right @alexdunnjpl , I believe the performance are good for non members requests is good so I will re-evaluate the criticality of the bug to medium. |
Actually I put it to high again. The query does not work quickly on other requests as well. For example /products with pagination. |
@tloubrieu-jpl do you have an example of a specific request or set of requests (and the relevant host) which are not performant? It's not immediately clear to me from the provided information that the api is the culprit as opposed to, say, the python client. |
@tloubrieu-jpl following up on this (see question in previous comment) |
Hi @alexdunnjpl , sorry I missed to answer that earlier. The request I was evaluating the performances from is in that notebook https://github.com/NASA-PDS/search-api-notebook/blob/main/notebooks/ovirs/part1/explore-a-collection.ipynb , cell "Request specific properties of all the observational products of the collection". To run that notebook, you need to clone the repository and follow the instructions in the README to deploy it locally. |
@tloubrieu-jpl what I'm suggesting is that, given that this is the first instance of this kind of performance problem we've seen (that I know of, at least), this probably isn't an API bug. So unless there's a curl request to reproduce the behaviour outside of the notebook, I'd suggest closing/suspending this ticket and treating it as a bug in the notebook or one of its dependencies (which Al or I could still look into, for sure) |
No, this bug has come and gone a couple of times (feels like dozen dozens or bunches). It has been a glob of things like cluster nodes slow to respond, overhead in cluster communications, and more. It was supposed to all be fixed with multi-tenancy or some word like that. The notebook request is a members request which is hard to turn into a direct opensearch curl for measuring opensearch vs registry-api timing. In the past, it was shown that the lag was at opensearch and not at registry-api addition processing. I guess I need to make that first step in the binary search persistent. Are you working it or do you want me to do it? |
@al-niessner ahh, gotcha. In that case, seems like a probably fix via #353 to avoid the need to search anything? Happy to work it or yield it, I think Jordan/Thomas were looking to divvy up tasks between us at breakout today, now that the cognito work is being pushed back as a priority. |
Lets take it to the breakout. If the new ancestry field gets rid of the double lookup, then lets run that direction. It will make finding the delays so much simpler if nothing else. |
Not sure how I mistook this as being related to /members, but it isn't. Still investigating. |
@tloubrieu-jpl @jordanpadams the bug is in the pds python api client. API logs reveal that the client is paging without regard to the limit value (see qparams
Who's the most familiar with the python client codebase? @tloubrieu-jpl or @jjacob7734 perhaps? If either of y'all can think of where in the client package code the page iteration occurs, that'd save me some time/effort, otherwise I'll just look for it myself. If you could point me to the repo that'd be helpful, at least - |
I've just tested at 5000doc/page, and found the following:
So for full docs, that's 27ms/doc at 50doc/page, 4.3ms/doc at 500doc/page, 1.8ms/doc at 5000doc/page |
Thanks @alexdunnjpl , regarding the <1s requirement we will need to refine which request supports that. The motivation behind the requirement is to have very responsive web pages when the API is used as a back-end for the web search. It might even mean that we need even faster responses for a subset of requests to provide a fluent navigation. But for notebook search longer response can be acceptable. |
Deeper investigation yields the following, given 5000doc page size:
Implementation of
@tloubrieu-jpl @jordanpadams on the user side, they would need to supply a The way
Thoughts on all that @jordanpadams @tloubrieu-jpl ? |
Hi @alexdunnjpl , I started a spreadsheet to see how it would be best to implement the pagination in the API , regarding what is intuitive for our users, what is most easy/performance to implement. See https://docs.google.com/spreadsheets/d/1NWE6xWT_OOGb6VIPclbx4r7JtyTy5cWJazqkVi1TjpM/edit?usp=sharing |
Regarding PIT: https://opensearch.org/docs/latest/search-plugins/point-in-time/ PIT does not require sorting because (I assume) the opensearch-internal row ID functions as a de-facto sort in that use-case. Quick resource note |
After discussing with @alexdunnjpl we decided to implement the pagination with url parameters:
To Be checked if we user _ or - in the name. It is alike what CMR proposes (See https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#search-after) This is also the option which is preferred by OpenSearch (see https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#scroll-search), especially with the Point In Time (coming later). For the implementation we will always sort the results by |
Alex next steps:
|
Status: @alexdunnjpl in work, probably not going to make the sprint |
Status: basic functionality almost complete.
Outstanding design questions:
|
We prefer the option: 1?sort=key1,key2&search-after=val1,val21 In the documentation the advised sorting parameter should be More research is needed to see:
|
Initial benchmarks against prod using local API with page size 5000 suggest that custom sort does not slow down query time in any perceptible manner (strangely, but we can look at that later). Without sort, 32sec (84MB transferred) With harvest-time sort, 27sec (64MB transferred), with havest-time sort and minimal properties returned ( Without culling properties, that's 5-7ms/product, and ~9hrs to page 5M products. Pages of 100 run ~750ms (7.5ms/product), which actually isn't that much less efficient than paging large, which is a good thing. |
To-do:
|
Performance has markedly increased as a result of OpenSearch resource improvements. New start/limit benchmark using specific query, full-span (~275k records), page_size=50. 50-record rolling average applied page response time as function of page index with 1000ms low-pass filter with 500ms low-pass filter: |
New benchmark completed, 50doc pages start-limit: average 140ms/pg, linearly increasing to avg 220ms/pg. |
While attempting to confirm correctness, additional issues have arisen. Attempting to page over all 274749 hits with start-limit only appears to return 197636 unique doc ids. Search-after behaves as expected. @jordanpadams @tloubrieu-jpl do you want me to investigate what's going on with the start-limit pagination, or is it completely moot given that we're moving to search-after pagination? |
We expected more improvement on the initial use case also thanks to the usage of the new ancestry fields in the api, see #353 |
Tightly coupled with #353, transitioning to that ticket for completion. |
@gxtchen , you can use the jupyter notebook in this repository https://github.com/NASA-PDS/search-api-notebook |
Checked for duplicates
No - I haven't checked
π Describe the bug
When I do a requests over themembers of one collections, 500 products at a time, we pull on average 2 products per seconds.
See screenshot
π΅οΈ Expected behavior
We have a requirement for a single request to be executed in less that one second. In this example we want to pull 500 products per request (500 is an example here which gives a idea of the size of a request, 100 to a few hundred products).
So I expect the performance to be around 1/100 s per product or less.
π To Reproduce
Jupyter notebook to be provided.
π₯ Environment Info
No response
π Version of Software Used
No response
π©Ί Test Data / Additional context
No response
π¦ Related requirements
π¦ #16
βοΈ Engineering Details
No response
The text was updated successfully, but these errors were encountered: