Pagination performance does not meet requirements #352

tloubrieu-jpl · 2023-06-20T22:54:33Z

Checked for duplicates

No - I haven't checked

🐛 Describe the bug

When I do a requests over themembers of one collections, 500 products at a time, we pull on average 2 products per seconds.

See screenshot

🕵️ Expected behavior

We have a requirement for a single request to be executed in less that one second. In this example we want to pull 500 products per request (500 is an example here which gives a idea of the size of a request, 100 to a few hundred products).

So I expect the performance to be around 1/100 s per product or less.

📜 To Reproduce

Jupyter notebook to be provided.

🖥 Environment Info

No response

📚 Version of Software Used

No response

🩺 Test Data / Additional context

No response

🦄 Related requirements

🦄 #16

⚙️ Engineering Details

No response

alexdunnjpl · 2023-06-21T00:10:43Z

Probably resolvable via #353

tloubrieu-jpl · 2023-06-21T17:16:25Z

Yes you are right @alexdunnjpl , I believe the performance are good for non members requests is good so I will re-evaluate the criticality of the bug to medium.

tloubrieu-jpl · 2023-06-21T18:32:04Z

Actually I put it to high again. The query does not work quickly on other requests as well. For example /products with pagination.

alexdunnjpl · 2023-06-21T18:38:39Z

@tloubrieu-jpl do you have an example of a specific request or set of requests (and the relevant host) which are not performant? It's not immediately clear to me from the provided information that the api is the culprit as opposed to, say, the python client.

alexdunnjpl · 2023-08-01T18:51:54Z

@tloubrieu-jpl following up on this (see question in previous comment)

tloubrieu-jpl · 2023-08-02T20:59:55Z

Hi @alexdunnjpl , sorry I missed to answer that earlier.

The request I was evaluating the performances from is in that notebook https://github.com/NASA-PDS/search-api-notebook/blob/main/notebooks/ovirs/part1/explore-a-collection.ipynb , cell "Request specific properties of all the observational products of the collection".

To run that notebook, you need to clone the repository and follow the instructions in the README to deploy it locally.

alexdunnjpl · 2023-08-02T21:04:49Z

@tloubrieu-jpl what I'm suggesting is that, given that this is the first instance of this kind of performance problem we've seen (that I know of, at least), this probably isn't an API bug.

So unless there's a curl request to reproduce the behaviour outside of the notebook, I'd suggest closing/suspending this ticket and treating it as a bug in the notebook or one of its dependencies (which Al or I could still look into, for sure)

al-niessner · 2023-08-03T16:28:31Z

@alexdunnjpl

No, this bug has come and gone a couple of times (feels like dozen dozens or bunches). It has been a glob of things like cluster nodes slow to respond, overhead in cluster communications, and more. It was supposed to all be fixed with multi-tenancy or some word like that. The notebook request is a members request which is hard to turn into a direct opensearch curl for measuring opensearch vs registry-api timing. In the past, it was shown that the lag was at opensearch and not at registry-api addition processing. I guess I need to make that first step in the binary search persistent.

Are you working it or do you want me to do it?

alexdunnjpl · 2023-08-03T17:17:11Z

@al-niessner ahh, gotcha. In that case, seems like a probably fix via #353 to avoid the need to search anything?

Happy to work it or yield it, I think Jordan/Thomas were looking to divvy up tasks between us at breakout today, now that the cognito work is being pushed back as a priority.

al-niessner · 2023-08-03T17:33:39Z

@alexdunnjpl

Lets take it to the breakout. If the new ancestry field gets rid of the double lookup, then lets run that direction. It will make finding the delays so much simpler if nothing else.

alexdunnjpl · 2023-08-10T23:16:49Z

Not sure how I mistook this as being related to /members, but it isn't. Still investigating.

alexdunnjpl · 2023-08-10T23:27:40Z

@tloubrieu-jpl @jordanpadams the bug is in the pds python api client.

API logs reveal that the client is paging without regard to the limit value (see qparams limit and start)

2023-08-10T16:15:28.960-07:00 DEBUG 1328923 --- [nio-8080-exec-7] o.s.web.servlet.DispatcherServlet        : GET "/products?fields=orex:Spatial.orex:latitude&fields=orex:Spatial.orex:longitude&fields=ref_lid_instrument&fields=orex:Spatial.orex:target_range&fields=ops:Data_File_Info.ops:file_ref&limit=500&q=%20%28%20%28%20ref_lid_target%20eq%20%22urn%3Anasa%3Apds%3Acontext%3Atarget%3Aasteroid.101955_bennu%22%20%29%20and%20%28%20ref_lid_instrument%20eq%20%22urn%3Anasa%3Apds%3Acontext%3Ainstrument%3Aovirs.orex%22%20%29%20%29&start=28000", parameters={masked}
2023-08-10T16:15:28.960-07:00 DEBUG 1328923 --- [nio-8080-exec-7] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped to gov.nasa.pds.api.registry.controller.SwaggerJavaTransmuter#productList(List, List, Integer, String, List, Integer)
2023-08-10T16:15:42.862-07:00 DEBUG 1328923 --- [nio-8080-exec-7] o.s.w.s.m.m.a.HttpEntityMethodProcessor  : Using 'application/json', given [application/json] and supported [application/json, */*]
2023-08-10T16:15:42.876-07:00 DEBUG 1328923 --- [nio-8080-exec-7] o.s.w.s.m.m.a.HttpEntityMethodProcessor  : Writing [class PdsProducts {<EOL>    summary: class Summary {<EOL>        q:  ( ( ref_lid_target eq "urn:nasa:pds:con (truncated)...]
2023-08-10T16:15:42.890-07:00 DEBUG 1328923 --- [nio-8080-exec-7] o.s.web.servlet.DispatcherServlet        : Completed 200 OK
2023-08-10T16:15:42.961-07:00 DEBUG 1328923 --- [nio-8080-exec-8] o.s.web.servlet.DispatcherServlet        : GET "/products?fields=orex:Spatial.orex:latitude&fields=orex:Spatial.orex:longitude&fields=ref_lid_instrument&fields=orex:Spatial.orex:target_range&fields=ops:Data_File_Info.ops:file_ref&limit=500&q=%20%28%20%28%20ref_lid_target%20eq%20%22urn%3Anasa%3Apds%3Acontext%3Atarget%3Aasteroid.101955_bennu%22%20%29%20and%20%28%20ref_lid_instrument%20eq%20%22urn%3Anasa%3Apds%3Acontext%3Ainstrument%3Aovirs.orex%22%20%29%20%29&start=28500", parameters={masked}
2023-08-10T16:15:42.962-07:00 DEBUG 1328923 --- [nio-8080-exec-8] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped to gov.nasa.pds.api.registry.controller.SwaggerJavaTransmuter#productList(List, List, Integer, String, List, Integer)
2023-08-10T16:15:56.673-07:00 DEBUG 1328923 --- [nio-8080-exec-8] o.s.w.s.m.m.a.HttpEntityMethodProcessor  : Using 'application/json', given [application/json] and supported [application/json, */*]
2023-08-10T16:15:56.689-07:00 DEBUG 1328923 --- [nio-8080-exec-8] o.s.w.s.m.m.a.HttpEntityMethodProcessor  : Writing [class PdsProducts {<EOL>    summary: class Summary {<EOL>        q:  ( ( ref_lid_target eq "urn:nasa:pds:con (truncated)...]
2023-08-10T16:15:56.701-07:00 DEBUG 1328923 --- [nio-8080-exec-8] o.s.web.servlet.DispatcherServlet        : Completed 200 OK
2023-08-10T16:15:56.768-07:00 DEBUG 1328923 --- [nio-8080-exec-9] o.s.web.servlet.DispatcherServlet        : GET "/products?fields=orex:Spatial.orex:latitude&fields=orex:Spatial.orex:longitude&fields=ref_lid_instrument&fields=orex:Spatial.orex:target_range&fields=ops:Data_File_Info.ops:file_ref&limit=500&q=%20%28%20%28%20ref_lid_target%20eq%20%22urn%3Anasa%3Apds%3Acontext%3Atarget%3Aasteroid.101955_bennu%22%20%29%20and%20%28%20ref_lid_instrument%20eq%20%22urn%3Anasa%3Apds%3Acontext%3Ainstrument%3Aovirs.orex%22%20%29%20%29&start=29000", parameters={masked}
2023-08-10T16:15:56.769-07:00 DEBUG 1328923 --- [nio-8080-exec-9] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped to gov.nasa.pds.api.registry.controller.SwaggerJavaTransmuter#productList(List, List, Integer, String, List, Integer)
2023-08-10T16:16:11.571-07:00 DEBUG 1328923 --- [nio-8080-exec-9] o.s.w.s.m.m.a.HttpEntityMethodProcessor  : Using 'application/json', given [application/json] and supported [application/json, */*]
2023-08-10T16:16:11.586-07:00 DEBUG 1328923 --- [nio-8080-exec-9] o.s.w.s.m.m.a.HttpEntityMethodProcessor  : Writing [class PdsProducts {<EOL>    summary: class Summary {<EOL>        q:  ( ( ref_lid_target eq "urn:nasa:pds:con (truncated)...]
2023-08-10T16:16:11.599-07:00 DEBUG 1328923 --- [nio-8080-exec-9] o.s.web.servlet.DispatcherServlet        : Completed 200 OK

Who's the most familiar with the python client codebase? @tloubrieu-jpl or @jjacob7734 perhaps? If either of y'all can think of where in the client package code the page iteration occurs, that'd save me some time/effort, otherwise I'll just look for it myself. If you could point me to the repo that'd be helpful, at least - pds-api-client doesn't seem to be it.

alexdunnjpl · 2023-08-16T00:12:41Z

I've just tested at 5000doc/page, and found the following:

Query time of 1300ms irrespective of field subsetting
Download time of 7600ms for full documents
Download time of 600ms for heavily-subset documents

So for full docs, that's 27ms/doc at 50doc/page, 4.3ms/doc at 500doc/page, 1.8ms/doc at 5000doc/page

tloubrieu-jpl · 2023-08-16T17:01:04Z

Thanks @alexdunnjpl , regarding the <1s requirement we will need to refine which request supports that. The motivation behind the requirement is to have very responsive web pages when the API is used as a back-end for the web search. It might even mean that we need even faster responses for a subset of requests to provide a fluent navigation.

But for notebook search longer response can be acceptable.

alexdunnjpl · 2023-08-23T23:55:20Z

Deeper investigation yields the following, given 5000doc page size:

client.search() increases linearly with page "index" at about 3000ms per page
getHits() is a constant 300ms +/-50ms, as expected
API adds a constant 1500ms +/-500ms

Implementation of search_after requires the following:

implementation of _id sort*
implementation of search_after qparam support (this may exist as-is, but need to confirm)
calling request.source().searchAfter(new Object[] {"_id"}) in RequestAndResponseContext

@tloubrieu-jpl @jordanpadams on the user side, they would need to supply a search_after qparam equal to the _id (i.e. lidvid) of the last element returned in the previous query. i.e. the python client would need/want to abstract this away from the user, and any raw requests not using this functionality would fail to benefit from the update.

The way search_after works is that the db applies a sort to the full dataset, and then uses the requested search_after value to quickly access the desired page. Presumably the query is also cached for some period of time, too.

this conflicts with the currently-not-implemented(?) requirement to allow the user to provide sorting instructions via the API. Sorting by _id in all cases is simplest, but will need to be dealt with in a more complex manner if user-specified sorting is ever implemented.

Thoughts on all that @jordanpadams @tloubrieu-jpl ?

tloubrieu-jpl · 2023-08-28T15:48:22Z

Hi @alexdunnjpl ,

I started a spreadsheet to see how it would be best to implement the pagination in the API , regarding what is intuitive for our users, what is most easy/performance to implement. See https://docs.google.com/spreadsheets/d/1NWE6xWT_OOGb6VIPclbx4r7JtyTy5cWJazqkVi1TjpM/edit?usp=sharing

alexdunnjpl · 2023-08-28T17:07:56Z

Regarding PIT:

https://opensearch.org/docs/latest/search-plugins/point-in-time/

PIT does not require sorting because (I assume) the opensearch-internal row ID functions as a de-facto sort in that use-case.

Quick resource note
https://stackoverflow.com/questions/68127892/how-does-search-after-work-in-elastic-search

tloubrieu-jpl · 2023-08-28T21:13:20Z

After discussing with @alexdunnjpl we decided to implement the pagination with url parameters:

search_after
page_size

To Be checked if we user _ or - in the name.

It is alike what CMR proposes (See https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#search-after)

This is also the option which is preferred by OpenSearch (see https://opensearch.org/docs/latest/search-plugins/searching-data/paginate/#scroll-search), especially with the Point In Time (coming later).

For the implementation we will always sort the results by lidvid (see https://stackoverflow.com/a/68535811/22141093), as the lowest sorting property (for when we implement user sort).

alexdunnjpl · 2023-08-28T21:18:26Z

Alex next steps:

Benchmark effect of sorting by lidvid vs unsorted
If necessary, benchmark search-after paging in python or Postman

jordanpadams · 2023-10-31T20:04:58Z

Status: @alexdunnjpl in work, probably not going to make the sprint

alexdunnjpl · 2023-11-07T20:56:32Z

Status: basic functionality almost complete.
Outstanding:

incorporation into Pagination/PaginationLidvidBuilder

Outstanding design questions:

Should sort qparam be removed? (Yes, as-is, as sort is a mandatory side-effect of searchAfter)
Should exposed interface be ?search-after=key1=val1,key2=val2 or ?sort=key1,key2&search-after=val1,val2? The former may be clearer to users, but at the same time the latter is more representative of what's being done, allows specification of multiple sort without search-after and may allow for cleaner implementation of asc/desc specification.
First-page behaviour is kinda terrible. It's necessary to specify sort so that the correct first page is populated. Currently, this looks like ?search-after=somekey=. Need to check whether quotes are used in Spring to wrap null strings... but either way the whole thing is relatively fiddly and users will need to read a documentation snippet to have any hope of accessing multi-page data.

tloubrieu-jpl · 2023-11-07T22:29:29Z

We prefer the option: 1?sort=key1,key2&search-after=val1,val21

In the documentation the advised sorting parameter should be harvest_time.

More research is needed to see:

@tloubrieu-jpl @alexdunnjpl how to handle default sort by looking at other implementations of search-after (e.g. CRM...)
@alexdunnjpl will look at the relevancy (score) sort in opensearch to confirm it works with search-after
we need to consider to add a next-key attribute in the summary of the response for the user to use to get their next page (see alternate implementations)

alexdunnjpl · 2023-11-09T02:02:40Z

Initial benchmarks against prod using local API with page size 5000 suggest that custom sort does not slow down query time in any perceptible manner (strangely, but we can look at that later).

Without sort, 32sec (84MB transferred) With harvest-time sort, 27sec (64MB transferred), with havest-time sort and minimal properties returned (fields=ops%3AHarvest_Info%2Fops%3Aharvest_date_time), (13MB transferred). N.B. This conflicts with previous benchmarks, which were much faster

Without culling properties, that's 5-7ms/product, and ~9hrs to page 5M products.

Pages of 100 run ~750ms (7.5ms/product), which actually isn't that much less efficient than paging large, which is a good thing.

alexdunnjpl · 2023-11-09T02:03:05Z

To-do:

integrate Pagination/PaginationLidvidBuilder
re-run test from jupyter notebook to confirm expected speed-up and lack of response time linearly-increasing with page index

alexdunnjpl · 2023-11-09T23:20:11Z

Performance has markedly increased as a result of OpenSearch resource improvements.

New start/limit benchmark using specific query, full-span (~275k records), page_size=50. 50-record rolling average applied

page response time as function of page index with 1000ms low-pass filter

with 500ms low-pass filter:

alexdunnjpl · 2023-11-16T19:07:38Z

New benchmark completed, 50doc pages

start-limit: average 140ms/pg, linearly increasing to avg 220ms/pg.
search-after: average 116ms/pg, constant, so 190% throughput compared to start-limit (with the given query, at least)

alexdunnjpl · 2023-11-16T19:49:51Z

While attempting to confirm correctness, additional issues have arisen.

Attempting to page over all 274749 hits with start-limit only appears to return 197636 unique doc ids.

Search-after behaves as expected. @jordanpadams @tloubrieu-jpl do you want me to investigate what's going on with the start-limit pagination, or is it completely moot given that we're moving to search-after pagination?

tloubrieu-jpl · 2023-11-16T22:47:23Z

We expected more improvement on the initial use case also thanks to the usage of the new ancestry fields in the api, see #353

alexdunnjpl · 2023-11-22T18:09:56Z

Tightly coupled with #353, transitioning to that ticket for completion.

tloubrieu-jpl · 2024-03-11T17:17:53Z

@gxtchen , you can use the jupyter notebook in this repository https://github.com/NASA-PDS/search-api-notebook
using the ovirs/part1/explore_a_collection notebook, cell Request specific properties of all the observational products of the collection

tloubrieu-jpl added B14.0 bug Something isn't working s.high High severity labels Jun 20, 2023

tloubrieu-jpl assigned jordanpadams Jun 20, 2023

tloubrieu-jpl added the needs:triage label Jun 20, 2023

jordanpadams removed the needs:triage label Jun 20, 2023

jordanpadams assigned alexdunnjpl and unassigned jordanpadams Jun 20, 2023

jordanpadams changed the title ~~pagination is slow~~ Pagination performance does not meet requirements Jun 20, 2023

alexdunnjpl mentioned this issue Jun 21, 2023

registry-api needs to use the new ancestry fields #353

Closed

tloubrieu-jpl added s.medium Medium level severity and removed s.high High severity labels Jun 21, 2023

tloubrieu-jpl added s.high High severity and removed s.medium Medium level severity labels Jun 21, 2023

tloubrieu-jpl mentioned this issue Jun 21, 2023

Search api notebook 6 NASA-PDS/search-api-notebook#15

Merged

jordanpadams assigned al-niessner Jul 16, 2023

jordanpadams added the sprint-backlog label Jul 31, 2023

tloubrieu-jpl unassigned al-niessner Aug 8, 2023

tloubrieu-jpl added the B14.1 label Sep 28, 2023

tloubrieu-jpl added the open.v1.3.1 label Nov 2, 2023

tloubrieu-jpl mentioned this issue Nov 16, 2023

Pagination with start/limit parameter might return incomplete data and duplicates #386

Closed

jordanpadams added this to EN Portfolio Backlog Nov 20, 2023

github-project-automation bot moved this to Release Backlog in EN Portfolio Backlog Nov 20, 2023

alexdunnjpl mentioned this issue Dec 1, 2023

search-after pagination and ancestry membership implementation #397

Merged

jordanpadams mentioned this issue Dec 5, 2023

Use New Registry API for Context Product Validation NASA-PDS/validate#696

Closed

jordanpadams mentioned this issue Dec 13, 2023

Broken tests per pagination upgrade #397 #404

Closed

jordanpadams closed this as completed in #397 Dec 13, 2023

github-project-automation bot moved this from Backlog to 🏁 Done in EN Portfolio Backlog Dec 13, 2023

tloubrieu-jpl mentioned this issue Feb 28, 2024

handle new pagination with search-after parameter NASA-PDS/search-api-notebook#65

Merged

jordanpadams removed the sprint-backlog label Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pagination performance does not meet requirements #352

Pagination performance does not meet requirements #352

tloubrieu-jpl commented Jun 20, 2023 •

edited by jordanpadams

Loading

alexdunnjpl commented Jun 21, 2023

tloubrieu-jpl commented Jun 21, 2023

tloubrieu-jpl commented Jun 21, 2023

alexdunnjpl commented Jun 21, 2023

alexdunnjpl commented Aug 1, 2023

tloubrieu-jpl commented Aug 2, 2023

alexdunnjpl commented Aug 2, 2023

al-niessner commented Aug 3, 2023 •

edited

Loading

alexdunnjpl commented Aug 3, 2023

al-niessner commented Aug 3, 2023

alexdunnjpl commented Aug 10, 2023

alexdunnjpl commented Aug 10, 2023 •

edited

Loading

alexdunnjpl commented Aug 16, 2023

tloubrieu-jpl commented Aug 16, 2023

alexdunnjpl commented Aug 23, 2023

tloubrieu-jpl commented Aug 28, 2023

alexdunnjpl commented Aug 28, 2023 •

edited

Loading

tloubrieu-jpl commented Aug 28, 2023 •

edited

Loading

alexdunnjpl commented Aug 28, 2023

jordanpadams commented Oct 31, 2023

alexdunnjpl commented Nov 7, 2023

tloubrieu-jpl commented Nov 7, 2023 •

edited

Loading

alexdunnjpl commented Nov 9, 2023 •

edited

Loading

alexdunnjpl commented Nov 9, 2023 •

edited

Loading

alexdunnjpl commented Nov 9, 2023 •

edited

Loading

alexdunnjpl commented Nov 16, 2023 •

edited

Loading

alexdunnjpl commented Nov 16, 2023 •

edited

Loading

tloubrieu-jpl commented Nov 16, 2023

alexdunnjpl commented Nov 22, 2023

tloubrieu-jpl commented Mar 11, 2024

Pagination performance does not meet requirements #352

Pagination performance does not meet requirements #352

Comments

tloubrieu-jpl commented Jun 20, 2023 • edited by jordanpadams Loading

Checked for duplicates

🐛 Describe the bug

🕵️ Expected behavior

📜 To Reproduce

🖥 Environment Info

📚 Version of Software Used

🩺 Test Data / Additional context

🦄 Related requirements

⚙️ Engineering Details

alexdunnjpl commented Jun 21, 2023

tloubrieu-jpl commented Jun 21, 2023

tloubrieu-jpl commented Jun 21, 2023

alexdunnjpl commented Jun 21, 2023

alexdunnjpl commented Aug 1, 2023

tloubrieu-jpl commented Aug 2, 2023

alexdunnjpl commented Aug 2, 2023

al-niessner commented Aug 3, 2023 • edited Loading

alexdunnjpl commented Aug 3, 2023

al-niessner commented Aug 3, 2023

alexdunnjpl commented Aug 10, 2023

alexdunnjpl commented Aug 10, 2023 • edited Loading

alexdunnjpl commented Aug 16, 2023

tloubrieu-jpl commented Aug 16, 2023

alexdunnjpl commented Aug 23, 2023

tloubrieu-jpl commented Aug 28, 2023

alexdunnjpl commented Aug 28, 2023 • edited Loading

tloubrieu-jpl commented Aug 28, 2023 • edited Loading

alexdunnjpl commented Aug 28, 2023

jordanpadams commented Oct 31, 2023

alexdunnjpl commented Nov 7, 2023

tloubrieu-jpl commented Nov 7, 2023 • edited Loading

alexdunnjpl commented Nov 9, 2023 • edited Loading

alexdunnjpl commented Nov 9, 2023 • edited Loading

alexdunnjpl commented Nov 9, 2023 • edited Loading

alexdunnjpl commented Nov 16, 2023 • edited Loading

alexdunnjpl commented Nov 16, 2023 • edited Loading

tloubrieu-jpl commented Nov 16, 2023

alexdunnjpl commented Nov 22, 2023

tloubrieu-jpl commented Mar 11, 2024

tloubrieu-jpl commented Jun 20, 2023 •

edited by jordanpadams

Loading

al-niessner commented Aug 3, 2023 •

edited

Loading

alexdunnjpl commented Aug 10, 2023 •

edited

Loading

alexdunnjpl commented Aug 28, 2023 •

edited

Loading

tloubrieu-jpl commented Aug 28, 2023 •

edited

Loading

tloubrieu-jpl commented Nov 7, 2023 •

edited

Loading

alexdunnjpl commented Nov 9, 2023 •

edited

Loading

alexdunnjpl commented Nov 9, 2023 •

edited

Loading

alexdunnjpl commented Nov 9, 2023 •

edited

Loading

alexdunnjpl commented Nov 16, 2023 •

edited

Loading

alexdunnjpl commented Nov 16, 2023 •

edited

Loading