Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update sentence-transformers requirement from ~=3.0.1 to ~=3.1.0 #21

Closed

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Sep 16, 2024

Updates the requirements on sentence-transformers to permit the latest version.

Release notes

Sourced from sentence-transformers's releases.

v3.1.0 - Hard Negatives Mining utility; new loss function for symmetric tasks; streaming datasets; custom modules

This release introduces a hard negatives mining utility to get better models out of your data, a new strong loss function for symmetric tasks, training with streaming datasets to avoid having to store datasets fully on disk, custom modules to allow for more creativity from model authors, and many bug fixes, small additions and documentation improvements.

Install this version with

# Full installation:
pip install sentence-transformers[train]==3.1.0
Inference only:
pip install sentence-transformers==3.1.0

[!WARNING] Due to incompatibilities with Windows, we have set numpy<2 in the Sentence Transformers requirements. If you're not on Windows, you can still install numpy>=2 and everything should work as expected.

Hard Negatives Mining utility (#2768, #2848)

Hard negatives are texts that are rather similar to some anchor text (e.g. a question), but are not the correct match. For example:

  • Anchor: "are red pandas actually pandas?"
  • Positive: "Red pandas, like giant pandas, are bamboo eaters native to Asia's high forests. Despite these similarities and their shared name, the two species are not closely related. Red pandas are much smaller than giant pandas and are the only living member of their taxonomic family."
  • Hard negative: "The giant panda (Ailuropoda melanoleuca; Chinese: 大熊猫; pinyin: dàxióngmāo), also known as the panda bear or simply the panda, is a bear native to south central China."

These negatives are more difficult for a model to distinguish from the correct answer, leading to a stronger training signal and a stronger overall model when used with one of the Loss Functions that accepts (anchor, positive, negative) pairs such as the one above.

This release introduces a utility function called mine_hard_negatives that allows you to mine for these hard negatives given a (anchor, positive) dataset (and optionally a corpus of negative candidate texts).

It boasts the following features to give you fine-grained control over the similarity of the mined negatives relative to the anchor:

  • CrossEncoder rescoring for higher quality negative selection.
  • Skip the top $n$ negative candidates as these might be true positives.
  • Consider only the top $n$ negative candidates.
  • Skip negative candidates that are within some margin of the true similarity between anchor and positive.
  • Skip negative candidates whose similarity is larger than some max_score.
  • Two sampling strategies: pick the top negative candidates that satisfy the requirements, or pick them randomly.
  • FAISS index for searching for negative candidates.
  • Option to return data as triplets only, or as 2 + num_negatives-tuples.
from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset
Load a Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")
Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train")
print(dataset)
"""
</tr></table>

... (truncated)

Commits
  • 845dd54 Release v3.1.0
  • a3f2236 [feat] Update mine_hard_negatives to using a full corpus and multiple posit...
  • 8af7c5d [feat] Add column order warnings to the data collator (#2928)
  • bc9a666 [docs] Move losses up in the package reference; they're more important (#2929)
  • 597d5ed [fix] Add dtype cast for modules other than Transformer (#2889)
  • 6257cb0 [feat] Allow loading custom modules; encode kwargs passthrough to modules (...
  • ede5804 [security] Load weights only with torch.load & pytorch_model.bin (#2927)
  • 9e3da5b [deprecation] Push deprecation cycle for use_auth_token to v4 (#2926)
  • 1cd91ab [feat] Add cache_dir & config_args support to CrossEncoder (#2784)
  • 02fb5f8 [fix] Change eval dataloader to use eval_batch_size (#2847)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [sentence-transformers](https://github.com/UKPLab/sentence-transformers) to permit the latest version.
- [Release notes](https://github.com/UKPLab/sentence-transformers/releases)
- [Commits](UKPLab/sentence-transformers@v3.0.1...v3.1.0)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python Pull requests that update Python code labels Sep 16, 2024
Copy link
Contributor Author

dependabot bot commented on behalf of github Sep 23, 2024

Superseded by #22.

@dependabot dependabot bot closed this Sep 23, 2024
@dependabot dependabot bot deleted the dependabot/pip/sentence-transformers-approx-eq-3.1.0 branch September 23, 2024 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants