Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PSGallery Search() Function doesn't page unless the top is 100 above the result count #241

Open
3 tasks done
JustinGrote opened this issue Oct 27, 2022 · 2 comments
Open
3 tasks done

Comments

@JustinGrote
Copy link

JustinGrote commented Oct 27, 2022

Prerequisites

  • Write a descriptive title.
  • Make sure you are able to repro it on the latest released version
  • Search the existing issues.

Steps to reproduce

So I noticed this while experimenting with some high performance queries against PSGallery, maybe someone can clarify for me what's going on or if this is in fact an actual issue.

Say you do Find-PSResource 'Az*' which results in the following query according to fiddler:

https://preview.pwsh.gallery/api/v2/Search()?$filter=IsLatestVersion&searchTerm='Az%2A'&targetFramework=''&includePrerelease=false&$skip=0&$top=6000&semVerLevel=2.0.0

This query actually returns 1131 objects, but it also has a nextLink embedded that only skips 100
image
image

If you set the top results to 1230 or lower (1131 + 99), everything gets returned in a single request with no nextLink. If you set it to 1132, then you get the page result and what look to be mostly duplicates in the data. If you set it to 6000, you seem to get mostly duplicate data 6 times. If you set it to 4000, you seem to get it 4 times. Since the default in PowerShellGet is 6000, this is why Find-PSResource Az* is so dog slow. By contrast, if I make a custom Cmdlet that calls SearchAsync with a smaller Maxresult size, it's reasonably fast (there's some sort of artificial delay in v2 SearchAsync that doesn't exist in v3):
image

Change the maxcount to 4000, and it takes nearly 3 seconds to run, with the skip only skipping 100 each time but returning 1132 - skip results each time:
image

So something is broken in the server-side logic for the nextLink. I would expect that the server would have a certain predefined batch size it is willing to operate with (since you cannot specify this with the Search Odata query)

It only happens with large queries, hard to tell but seems to be >500 results. Anything with less results always returns correctly. A query with 597 results Be* is affected

Temporary Workaround

Limit the PSGetv3 search calls to 500 results, and warn if that exact number is hit indicating more results may be present, and add a configurable -MaxResults parameter, similar to how exchange works with its -ResultSize parameter

Expected behavior

Query would return in batches of a pre-determined server limit, e.g. 1000, and each skip would skip at that interval

Actual behavior

Skip always skips at intervals of 100 but still returns full-record data sets, leading to massive data deduplication and slow queries, which gets magnified the higher resultsize is set to.

Error details

No response

Environment data

7.2.3

Visuals

No response

@JustinGrote
Copy link
Author

@SydneyhSmith I created this in the wrong repo, please move it to Powershell/PowershellGallery. Thanks

@alerickson
Copy link
Member

@JustinGrote we're working on moving off of the NuGet client APIs (see: PowerShell/PowerShellGet#653) and I think that should resolve this issue PowerShellGet side. We'll look into why the server is returning these results, specifically the duplicate results. I'll move this over to the Gallery repo so we can track this issue there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants