Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants, CHANGES.txt entries (#13845) #13891

Open
wants to merge 1 commit into
base: branch_9_12
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ Bug Fixes
---------------------
(No changes)

API Changes
---------------------

* GITHUB#13845: Add missing with-discountOverlaps Similarity constructor variants. (Pierre Salagnac, Christine Poerschke, Robert Muir)

======================== Lucene 9.12.0 =======================

Security Fixes
Expand Down Expand Up @@ -47,6 +52,9 @@ API Changes
the entire segment should be scored. Subclasses that override the method should instead override its replacement.
(Luca Cavanna)

* GITHUB#13757: For similarities, provide default computeNorm implementation and remove remaining discountOverlaps setters.
(Christine Poerschke, Adrien Grand, Robert Muir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid is is now too late to modify the changelog for 9.12, as it's already been released?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, https://lucene.apache.org/core/9_12_0/changes/Changes.html cannot be modified but my understanding is that the addition of the missing entry here would be included in https://lucene.apache.org/core/9_12_1/changes/Changes.html in future then, with the 9.12.1 file including 9.12.0 and earlier content also. Whether or not that's worthwhile, hmm, subjective I guess.


New Features
---------------------

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,26 @@ public abstract class Axiomatic extends SimilarityBase {
protected final int queryLen;

/**
* Constructor setting all Axiomatic hyperparameters
* Constructor setting all Axiomatic hyperparameters and using default discountOverlaps value.
*
* @param s hyperparam for the growth function
* @param queryLen the query length
* @param k hyperparam for the primitive weighting function
*/
public Axiomatic(float s, int queryLen, float k) {
this(true, s, queryLen, k);
}

/**
* Constructor setting all Axiomatic hyperparameters
*
* @param discountOverlaps true if overlap tokens should not impact document length for scoring.
* @param s hyperparam for the growth function
* @param queryLen the query length
* @param k hyperparam for the primitive weighting function
*/
public Axiomatic(boolean discountOverlaps, float s, int queryLen, float k) {
super(discountOverlaps);
if (Float.isFinite(s) == false || Float.isNaN(s) || s < 0 || s > 1) {
throw new IllegalArgumentException("illegal s value: " + s + ", must be between 0 and 1");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,23 @@ public class DFISimilarity extends SimilarityBase {
private final Independence independence;

/**
* Create DFI with the specified divergence from independence measure
* Create DFI with the specified divergence from independence measure and using default
* discountOverlaps value
*
* @param independenceMeasure measure of divergence from independence
*/
public DFISimilarity(Independence independenceMeasure) {
this(independenceMeasure, true);
}

/**
* Create DFI with the specified parameters
*
* @param independenceMeasure measure of divergence from independence
* @param discountOverlaps true if overlap tokens should not impact document length for scoring.
*/
public DFISimilarity(Independence independenceMeasure, boolean discountOverlaps) {
super(discountOverlaps);
this.independence = independenceMeasure;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ public class DFRSimilarity extends SimilarityBase {
protected final Normalization normalization;

/**
* Creates DFRSimilarity from the three components.
* Creates DFRSimilarity from the three components and using default discountOverlaps value.
*
* <p>Note that <code>null</code> values are not allowed: if you want no normalization, instead
* pass {@link NoNormalization}.
Expand All @@ -98,7 +98,7 @@ public DFRSimilarity(
}

/**
* Creates DFRSimilarity from the three components.
* Creates DFRSimilarity from the three components and with the specified discountOverlaps value.
*
* <p>Note that <code>null</code> values are not allowed: if you want no normalization, instead
* pass {@link NoNormalization}.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ public class IBSimilarity extends SimilarityBase {
protected final Normalization normalization;

/**
* Creates IBSimilarity from the three components.
* Creates IBSimilarity from the three components and using default discountOverlaps value.
*
* <p>Note that <code>null</code> values are not allowed: if you want no normalization, instead
* pass {@link NoNormalization}.
Expand All @@ -86,6 +86,26 @@ public class IBSimilarity extends SimilarityBase {
* @param normalization term frequency normalization
*/
public IBSimilarity(Distribution distribution, Lambda lambda, Normalization normalization) {
this(distribution, lambda, normalization, true);
}

/**
* Creates IBSimilarity from the three components and with the specified discountOverlaps value.
*
* <p>Note that <code>null</code> values are not allowed: if you want no normalization, instead
* pass {@link NoNormalization}.
*
* @param distribution probabilistic distribution modeling term occurrence
* @param lambda distribution's &lambda;<sub>w</sub> parameter
* @param normalization term frequency normalization
* @param discountOverlaps true if overlap tokens should not impact document length for scoring.
*/
public IBSimilarity(
Distribution distribution,
Lambda lambda,
Normalization normalization,
boolean discountOverlaps) {
super(discountOverlaps);
this.distribution = distribution;
this.lambda = lambda;
this.normalization = normalization;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@ public class IndriDirichletSimilarity extends LMSimilarity {
/** The &mu; parameter. */
private final float mu;

/** Instantiates the similarity with the provided parameters. */
public IndriDirichletSimilarity(
CollectionModel collectionModel, boolean discountOverlaps, float mu) {
super(collectionModel, discountOverlaps);
this.mu = mu;
}

/** Instantiates the similarity with the provided &mu; parameter. */
public IndriDirichletSimilarity(CollectionModel collectionModel, float mu) {
super(collectionModel);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,13 @@ public class LMDirichletSimilarity extends LMSimilarity {

/** Instantiates the similarity with the provided &mu; parameter. */
public LMDirichletSimilarity(CollectionModel collectionModel, float mu) {
super(collectionModel);
this(collectionModel, true, mu);
}

/** Instantiates the similarity with the provided parameters. */
public LMDirichletSimilarity(
CollectionModel collectionModel, boolean discountOverlaps, float mu) {
super(collectionModel, discountOverlaps);
if (Float.isFinite(mu) == false || mu < 0) {
throw new IllegalArgumentException(
"illegal mu value: " + mu + ", must be a non-negative finite value");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,13 @@ public class LMJelinekMercerSimilarity extends LMSimilarity {

/** Instantiates with the specified collectionModel and &lambda; parameter. */
public LMJelinekMercerSimilarity(CollectionModel collectionModel, float lambda) {
super(collectionModel);
this(collectionModel, true, lambda);
}

/** Instantiates with the specified collectionModel and parameters. */
public LMJelinekMercerSimilarity(
CollectionModel collectionModel, boolean discountOverlaps, float lambda) {
super(collectionModel, discountOverlaps);
if (Float.isNaN(lambda) || lambda <= 0 || lambda > 1) {
throw new IllegalArgumentException("lambda must be in the range (0 .. 1]");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,12 @@ public abstract class LMSimilarity extends SimilarityBase {

/** Creates a new instance with the specified collection language model. */
public LMSimilarity(CollectionModel collectionModel) {
this(collectionModel, true);
}

/** Creates a new instance with the specified collection language model and discountOverlaps. */
public LMSimilarity(CollectionModel collectionModel, boolean discountOverlaps) {
super(discountOverlaps);
this.collectionModel = collectionModel;
}

Expand Down