-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants, CHANGES.txt entries #13845
Conversation
Thanks @psalagnac for identifying this in the https://issues.apache.org/jira/browse/SOLR-17471 ticket! |
* @param queryLen the query length | ||
* @param k hyperparam for the primitive weighting function | ||
*/ | ||
public Axiomatic(boolean discountOverlaps, float s, int queryLen, float k) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically additional variant constructors could also be added for all the classes that derive from Axiomatic
but there are relatively many derivations with multiple constructors. The class is marked @lucene.experimental
and so perhaps it's okay to add additional variants if/when a need for them is identified?
* @param independenceMeasure measure of divergence from independence | ||
* @param discountOverlaps true if overlap tokens should not impact document length for scoring. | ||
*/ | ||
public DFISimilarity(Independence independenceMeasure, boolean discountOverlaps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -98,7 +98,7 @@ public DFRSimilarity( | |||
} | |||
|
|||
/** | |||
* Creates DFRSimilarity from the three components. | |||
* Creates DFRSimilarity from the four components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't a DFR component. it is just an expert boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we use "component" in these docs, it has special meaning to the user. Let's make sure to not anything about that in these constructors (DFR is not the only one like this)
DFR models are obtained by instantiating the three components of the framework: selecting a basic randomness model, applying the first normalisation, and normalising the term frequencies
https://en.wikipedia.org/wiki/Divergence-from-randomness_model
Distribution distribution, | ||
Lambda lambda, | ||
Normalization normalization, | ||
boolean discountOverlaps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as DFR, this isn't a component
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: reworded to ... three components and with the specified discountOverlaps value.
} | ||
|
||
/** Instantiates the similarity with the provided parameters. */ | ||
public LMDirichletSimilarity(boolean discountOverlaps, float mu) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only need one ctor here with this parameter, just add it to the "expert" ctor only.
@@ -62,8 +74,15 @@ public LMDirichletSimilarity(CollectionModel collectionModel) { | |||
} | |||
|
|||
/** Instantiates the similarity with the default μ value of 2000. */ | |||
public LMDirichletSimilarity(boolean discountOverlaps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add to expert CTOR only. no telescoping for this parameter, it is very expert.
} | ||
|
||
/** Instantiates with the specified parameters. */ | ||
public LMJelinekMercerSimilarity(boolean discountOverlaps, float lambda) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use other ctor, avoid telescoping this boolean
w.r.t. the CI failure -- https://github.com/apache/lucene/actions/runs/11130252581/job/30929198241?pr=13845 -- that locally seems to also happen on |
merged in latest |
The changes in this PR are to |
} | ||
|
||
/** Creates a new instance with the default collection language model. */ | ||
public LMSimilarity(boolean discountOverlaps) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just needs to be added to above "expert" ctor only.
I added comments, this boolean parameter is esoteric and just needs to be available on "expert ctor": not added to every possible ctor for these classes. We just need to make it possible. |
Thanks for the review! Made updates accordingly. |
…tructor variants, CHANGES.txt entries (apache#13845) (cherry picked from commit dab7311) (cherry picked from commit cbd8b52) Resolved Conflicts: lucene/CHANGES.txt
…tructor variants, CHANGES.txt entries (apache#13845) (cherry picked from commit dab7311) Resolved Conflicts: lucene/CHANGES.txt
#13757 follow-up