Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants, CHANGES.txt entries #13845

Merged
merged 5 commits into from
Oct 4, 2024

Conversation

cpoerschke
Copy link
Contributor

#13757 follow-up

@cpoerschke
Copy link
Contributor Author

Thanks @psalagnac for identifying this in the https://issues.apache.org/jira/browse/SOLR-17471 ticket!

* @param queryLen the query length
* @param k hyperparam for the primitive weighting function
*/
public Axiomatic(boolean discountOverlaps, float s, int queryLen, float k) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically additional variant constructors could also be added for all the classes that derive from Axiomatic but there are relatively many derivations with multiple constructors. The class is marked @lucene.experimental and so perhaps it's okay to add additional variants if/when a need for them is identified?

* @param independenceMeasure measure of divergence from independence
* @param discountOverlaps true if overlap tokens should not impact document length for scoring.
*/
public DFISimilarity(Independence independenceMeasure, boolean discountOverlaps) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -98,7 +98,7 @@ public DFRSimilarity(
}

/**
* Creates DFRSimilarity from the three components.
* Creates DFRSimilarity from the four components.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't a DFR component. it is just an expert boolean.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we use "component" in these docs, it has special meaning to the user. Let's make sure to not anything about that in these constructors (DFR is not the only one like this)

DFR models are obtained by instantiating the three components of the framework: selecting a basic randomness model, applying the first normalisation, and normalising the term frequencies

https://en.wikipedia.org/wiki/Divergence-from-randomness_model

Distribution distribution,
Lambda lambda,
Normalization normalization,
boolean discountOverlaps) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as DFR, this isn't a component

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: reworded to ... three components and with the specified discountOverlaps value.

}

/** Instantiates the similarity with the provided parameters. */
public LMDirichletSimilarity(boolean discountOverlaps, float mu) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need one ctor here with this parameter, just add it to the "expert" ctor only.

@@ -62,8 +74,15 @@ public LMDirichletSimilarity(CollectionModel collectionModel) {
}

/** Instantiates the similarity with the default μ value of 2000. */
public LMDirichletSimilarity(boolean discountOverlaps) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to expert CTOR only. no telescoping for this parameter, it is very expert.

}

/** Instantiates with the specified parameters. */
public LMJelinekMercerSimilarity(boolean discountOverlaps, float lambda) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use other ctor, avoid telescoping this boolean

@cpoerschke
Copy link
Contributor Author

w.r.t. the CI failure -- https://github.com/apache/lucene/actions/runs/11130252581/job/30929198241?pr=13845 -- that locally seems to also happen on origin/main branch itself.

@cpoerschke
Copy link
Contributor Author

w.r.t. the CI failure -- https://github.com/apache/lucene/actions/runs/11130252581/job/30929198241?pr=13845 -- that locally seems to also happen on origin/main branch itself.

merged in latest origin/main and the CI passes now then.

@cpoerschke cpoerschke changed the title PR 13757 follow-up: add missing with-discountOverlaps constructor variants PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants Oct 2, 2024
@cpoerschke cpoerschke changed the title PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants PR 13757 follow-up: add missing with-discountOverlaps Similarity constructor variants, CHANGES.txt entries Oct 2, 2024
@cpoerschke cpoerschke added this to the 10.0.0 milestone Oct 2, 2024
@ChrisHegarty
Copy link
Contributor

The changes in this PR are to lucene.experimental classes. As such, these changes are suitable for a minor release. I would suggest to retarget to 10x, since 10.0.0 is well past feature freeze so only really accepting critical bug fixes at this point.

@cpoerschke cpoerschke removed this from the 10.0.0 milestone Oct 2, 2024
}

/** Creates a new instance with the default collection language model. */
public LMSimilarity(boolean discountOverlaps) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just needs to be added to above "expert" ctor only.

@rmuir
Copy link
Member

rmuir commented Oct 3, 2024

I added comments, this boolean parameter is esoteric and just needs to be available on "expert ctor": not added to every possible ctor for these classes. We just need to make it possible.

@cpoerschke
Copy link
Contributor Author

I added comments, this boolean parameter is esoteric and just needs to be available on "expert ctor": not added to every possible ctor for these classes. We just need to make it possible.

Thanks for the review! Made updates accordingly.

@cpoerschke cpoerschke requested a review from rmuir October 3, 2024 09:13
@cpoerschke cpoerschke merged commit dab7311 into apache:main Oct 4, 2024
3 checks passed
@cpoerschke cpoerschke deleted the pr-13757-follow-up branch October 4, 2024 16:08
asfgit pushed a commit that referenced this pull request Oct 4, 2024
…tructor variants, CHANGES.txt entries (#13845)

(cherry picked from commit dab7311)
cpoerschke added a commit to cpoerschke/lucene that referenced this pull request Oct 4, 2024
…tructor variants, CHANGES.txt entries (apache#13845)

(cherry picked from commit dab7311)
(cherry picked from commit cbd8b52)

Resolved Conflicts:
	lucene/CHANGES.txt
javanna pushed a commit that referenced this pull request Oct 9, 2024
…tructor variants, CHANGES.txt entries (#13845) (#13858)

(cherry picked from commit dab7311)
(cherry picked from commit cbd8b52)

Resolved Conflicts:
	lucene/CHANGES.txt
javanna added a commit that referenced this pull request Oct 9, 2024
javanna added a commit that referenced this pull request Oct 9, 2024
cpoerschke added a commit to cpoerschke/lucene that referenced this pull request Oct 11, 2024
…tructor variants, CHANGES.txt entries (apache#13845)

(cherry picked from commit dab7311)

Resolved Conflicts:
	lucene/CHANGES.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants