Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add SentenceTransformersDiversityRanker #7095

Merged
merged 11 commits into from
Mar 11, 2024

Conversation

awinml
Copy link
Contributor

@awinml awinml commented Feb 26, 2024

Related Issues

Proposed Changes:

Adds SentenceTransformersDiversityRanker.

The Diversity Ranker orders documents in such a way as to maximize the overall diversity of the given documents. The ranker leverages sentence-transformer models to calculate semantic embeddings for each document and the query.

The ranker first calculates embeddings for each document and the query. It starts by selecting the document that is semantically closest to the query. Then, for each remaining document, it selects the one that, on average, is least similar to the already selected documents. This process continues until all documents are selected, resulting in a list where each subsequent document contributes the most to the overall diversity of the selected set.

How did you test it?

Tests have been added in test_sentence_transformers_diversity.py.

Checklist

@awinml awinml requested review from a team as code owners February 26, 2024 11:57
@awinml awinml requested review from dfokina and ZanSara and removed request for a team February 26, 2024 11:57
@github-actions github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Feb 26, 2024
@coveralls
Copy link
Collaborator

coveralls commented Feb 26, 2024

Pull Request Test Coverage Report for Build 8232016781

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 90.176%

Totals Coverage Status
Change from base Build 8190601051: 0.1%
Covered Lines: 5370
Relevant Lines: 5955

💛 - Coveralls

Copy link
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this, I did a quick look over and had two comments that I wanted to bring up.

haystack/components/rankers/diversity.py Outdated Show resolved Hide resolved
haystack/components/rankers/diversity.py Outdated Show resolved Hide resolved
@ZanSara ZanSara requested review from a team and vblagoje and removed request for ZanSara and a team February 26, 2024 14:54
@awinml awinml requested a review from sjrl February 27, 2024 15:00
@vblagoje
Copy link
Member

@sjrl can you please take over, I'm overloaded with some other unrelated items 🙏

@vblagoje vblagoje removed their request for review February 27, 2024 15:48
@sjrl
Copy link
Contributor

sjrl commented Feb 27, 2024

Yup I can review this more later this week.

@awinml awinml requested a review from sjrl March 7, 2024 09:33
@awinml awinml changed the title feat: Add DiversityRanker feat: Add SentenceTransformersDiversityRanker Mar 11, 2024
@awinml
Copy link
Contributor Author

awinml commented Mar 11, 2024

@sjrl All the tests, except those running the component on real data, have been converted to unit tests. Mocks have been used for the sentence-transformer model. An additional test for warm_up() has also been added. These improvements now bring the test coverage for this component to 100%.

@awinml awinml requested a review from sjrl March 11, 2024 11:45
Copy link
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @awinml this looks great!

@sjrl sjrl merged commit 38b3472 into deepset-ai:main Mar 11, 2024
23 checks passed
@sjrl sjrl linked an issue Mar 15, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DiversityRanker DiversityRanker
4 participants