Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The term 'aws' should be ignored by the search algorithm #828

Open
addihorowitz opened this issue Dec 26, 2021 · 4 comments
Open

The term 'aws' should be ignored by the search algorithm #828

addihorowitz opened this issue Dec 26, 2021 · 4 comments
Assignees
Labels
bug Something isn't working priority/p2 question Further information is requested stale

Comments

@addihorowitz
Copy link
Contributor

Some visitors search for 'AWS codepipeline', they get 688 results. Most of them are not relevant.

If you search instead for 'codepipline' you get 27 relevant results.

@gabewomble
Copy link
Contributor

I'm not sure we can implement this without breaking a lot of other title-based searches such as aws-cdk or aws-solutions. Results at the top for "aws codepipeline" are still the most relevant results. I have a PR which explores the suggested solution, but in testing I've found the search experience to be much worse

@gabewomble gabewomble added priority/p2 question Further information is requested and removed priority/p0 labels Dec 29, 2021
@gabewomble gabewomble self-assigned this Jan 4, 2022
@addihorowitz
Copy link
Contributor Author

addihorowitz commented Jan 9, 2022

I believe there's a difference between "aws-X" and "aws X". The first is one term 'aws-X', the second term is two words: 'aws' and 'X'. If a user types the word "aws" (not the prefix "aws", but a term that equals to "aws") we should ignore it

@gabewomble
Copy link
Contributor

gabewomble commented Jan 10, 2022

The issue is it conflicts entirely with how the search engine works. All search terms are "tokenized", meaning that they are separated into a list of segments. For example, if I search @aws-cdk/cloudfront static_site, the following tokens will be searched on: aws, cdk, cloudfront, static, site. Results will be returned by relevance based on field weights, meaning something like @aws-cdk/cloudfront would be returned before @aws-cdk/foo that includes cloudfront in its description

I will give this suggestion a try but I believe it will still cause problematic edge cases like before. I would also argue that we are getting acceptable and relevant results with the current behavior. Libraries that match the fields the strongest appear first, while looser matches have lower relevance scores. If you look at other search engines, it feels like the first 10-20% of results are strongly relevant, and beyond that point results are only tangentially related

@github-actions
Copy link
Contributor

This issue is now marked as stale because it hasn't seen activity for a while. Add a comment or it will be closed soon. If you wish to exclude this issue from being marked as stale, add the "backlog" label.

@github-actions github-actions bot added the stale label Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority/p2 question Further information is requested stale
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants