-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reciprocal Rank Fusion (RRF) normalization technique in hybrid query #874
Reciprocal Rank Fusion (RRF) normalization technique in hybrid query #874
Conversation
Signed-off-by: Isaac Johnson <[email protected]>
Signed-off-by: Isaac Johnson <[email protected]>
Signed-off-by: Isaac Johnson <[email protected]>
Signed-off-by: Isaac Johnson <[email protected]>
Signed-off-by: Isaac Johnson <[email protected]>
we should be merging to feature branch https://github.com/opensearch-project/neural-search/tree/feature/rrf-score-normalization, not main. |
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/NormalizationExecuteDTO.java
Outdated
Show resolved
Hide resolved
// Not currently using weights for RRF, no need to modify or verify these params | ||
public RRFScoreCombinationTechnique(final Map<String, Object> params, final ScoreCombinationUtil combinationUtil) { | ||
; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class not completed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we planned to have very simple implementation for this one, I'll be finishing this PR and address all misses if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I finished the class, please take a look @yuye-aws
Signed-off-by: Martin Gaievski <[email protected]>
@martin-gaievski Can you fix the DCO failure? |
public static final String TECHNIQUE_NAME = "rrf"; | ||
|
||
// Not currently using weights for RRF, no need to modify or verify these params | ||
public RRFScoreCombinationTechnique(final Map<String, Object> params, final ScoreCombinationUtil combinationUtil) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK that weights are not supported in the first release. This class does nothing but adding all the scores together. I'm afraid it's too over designed to introduce a new class for such a single sum operation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're reusing NormalizationProcessorWorkflow that is quite a big class, and it accepts both normalization and combination techniques classes as input arguments. Plus it's a single responsibility principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. For this PR, both params and combinationUtil are unused. You'd better delete them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will do in next PR, please remind me if I forget
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/normalization/ScoreNormalizationUtil.java
Show resolved
Hide resolved
@vibrantvarun and I still have some high level questions. If possible, can we three have a video meeting? We can address and review tests later. |
it's coming from one of the commits in feature branch, I cannot fix it in this PR |
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
TRIAGING.md
Outdated
The maintainers of the k-NN/neural-search Repo's seek to promote an inclusive and engaged community of contributors. In | ||
order to facilitate this, bi-weekly triage meetings are open-to-all and attendance is encouraged for anyone who hopes to | ||
contribute, discuss an issue, or learn more about the project. To learn more about contributing to the | ||
The maintainers of the k-NN/neural-search Repo's seek to promote an inclusive and engaged community of contributors. In |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file file should remain unchanged in scope of this PR. Please revert it back to original content
@@ -15,7 +15,7 @@ jobs: | |||
matrix: | |||
java: [ 21 ] | |||
os: [ubuntu-latest,windows-latest] | |||
bwc_version : ["2.9.0","2.10.0","2.11.0","2.12.0","2.13.0","2.14.0","2.15.0","2.16.0-SNAPSHOT"] | |||
bwc_version : ["2.9.0","2.10.0","2.11.0","2.12.0","2.13.0","2.14.0","2.15.0","2.16.0","2.17.0-SNAPSHOT"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a temporary change while 2.17 release is in progress? I don't think we need it for this PR. Same for the other change in this file
|
||
// Not currently using weights for RRF, no need to modify or verify these params | ||
public RRFScoreCombinationTechnique(final Map<String, Object> params, final ScoreCombinationUtil combinationUtil) { | ||
; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is not needed
// Not currently using weights for RRF, no need to modify or verify these params | ||
public RRFScoreCombinationTechnique(final Map<String, Object> params, final ScoreCombinationUtil combinationUtil) { | ||
; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we planned to have very simple implementation for this one, I'll be finishing this PR and address all misses if
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/normalization/ScoreNormalizationUtil.java
Show resolved
Hide resolved
*/ | ||
@Log4j2 | ||
@AllArgsConstructor | ||
public class RRFProcessor implements SearchPhaseResultsProcessor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both of you asking different questions: @vibrantvarun is referring to a single processor for RRF and score normalization, and @yuye-aws mentioning Alternative 2, which is about adding a new processor for RRF, but exposing both normalization and combination technique as params to end-user.
I can answer both in a similar fashion:
Fundamentally score normalization and rank based combination are different, so combining them in existing normalization processor isn't intuitive. Besides that it will require additional validation logic and at the code level will ruin existing abstractions, mainly because for normalization processor today we allow pairing of any normalization technique with any combination techniques. With addition of RRF we have to break this.
RRF is leaning towards the combination method as per offline discussion with our PM, exposing normalization function doesn't make sense/not adding value.
src/main/java/org/opensearch/neuralsearch/processor/normalization/ScoreNormalizationUtil.java
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
* Collection of utility methods for score combination technique classes | ||
*/ | ||
@Log4j2 | ||
class ScoreNormalizationUtil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we shift the code in this class to HybridQueryUtil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it belongs there, my view is - anything related to the query itself should go to that class, like parsing score collection into multiple sub query results.
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Show resolved
Hide resolved
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
Signed-off-by: Martin Gaievski <[email protected]>
|
||
/** | ||
* DTO object to hold data required for score normalization passed to execute() function | ||
* in NormalizationProcessorWorkflow. Field rankConstant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* in NormalizationProcessorWorkflow. Field rankConstant | |
* in NormalizationProcessorWorkflow. |
src/main/java/org/opensearch/neuralsearch/processor/NormalizationExecuteDTO.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/combination/ScoreCombinationUtil.java
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RRFProcessorFactory.java
Outdated
Show resolved
Hide resolved
src/main/java/org/opensearch/neuralsearch/processor/factory/RRFProcessorFactory.java
Outdated
Show resolved
Hide resolved
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Show resolved
Hide resolved
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Show resolved
Hide resolved
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Show resolved
Hide resolved
Signed-off-by: Martin Gaievski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one minor comment.
LGTM.
src/main/java/org/opensearch/neuralsearch/processor/NormalizationExecuteDTO.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Varun Jain <[email protected]> Signed-off-by: Martin Gaievski <[email protected]>
...main/java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechnique.java
Outdated
Show resolved
Hide resolved
*/ | ||
@Log4j2 | ||
@AllArgsConstructor | ||
public class RRFProcessor implements SearchPhaseResultsProcessor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there somewhere to validate that RRFNormalizationTechnique
is used together with RRFScoreCombinationTechnique
? The execute
method in NormalizationProcessorWorkflow
class doing normalization and them combination.
...ava/org/opensearch/neuralsearch/processor/combination/RRFScoreCombinationTechniqueTests.java
Show resolved
Hide resolved
private float RRF(List<Float> scores, List<Double> weights) { | ||
float sumScores = 0.0f; | ||
for (float score : scores) { | ||
sumScores += score; | ||
} | ||
return sumScores; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you adding these method in this testing? I think you can simply with a few examples like 1 plus 1 is 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's added to be compatible with https://github.com/opensearch-project/neural-search/blob/main/src/test/java/org/opensearch/neuralsearch/processor/combination/BaseScoreCombinationTechniqueTests.java and be able to use all test cases it provides. We need to ensure in better possible test coverage if it's a low hanging fruit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not get your point. The private randomScore
outputs non-deterministic results.
assertEquals( | ||
RescoreContext.getDefault().getOversampleFactor(), | ||
neuralQueryBuilder.rescoreContext().getOversampleFactor(), | ||
DELTA_FOR_FLOATS_ASSERTION | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you are already using big decimal, please remove the delta here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what do you mean, assert requires third parameter in case we're comparing floats, and both arguments are float
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean you are using big decimal in the test rrfNorm
method. You can be more strict, and the delta can be set to 0.
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Outdated
Show resolved
Hide resolved
...java/org/opensearch/neuralsearch/processor/normalization/RRFNormalizationTechniqueTests.java
Show resolved
Hide resolved
We do not retrieve normalization technique from user input, it's hardcoded and passed to processor class by the factory, check out code snippet I want to keep NormalizationProcessorWorkflow generic, maybe later refactor it to more abstract class not specific to normalization. |
Signed-off-by: Martin Gaievski <[email protected]>
I've addressed all comments, and most of them were minor in recent reviews. I'll be merging this one to feature branch and we'll start one more related to RRF soon, with focus on testing |
245cd14
into
opensearch-project:feature/rrf-score-normalization-v2
Nice work @martin-gaievski |
Description
Adding ability to process and combine scores from multiple subqueries in neural search using the reciprocal rank fusion (RRF) technique. Built with a new processor and processor factory class apart from NormalizationProcessor. Changes to API included in RFC. Does not currently support weights when combining processed subquery scores, based on lack of examples in existing literature.
Example of usage for RRF processor:
create index
create pipeline with rrf processor and all defaults
ingest 4 documents
run search request
you'll get following response
if you change rank to something smaller, like '1' your scores all will be scalled up
update rank contant
and search response is
for comparison this is the response for same query if we use normalization processor with default techniques.
Important difference is that delta between document scores with RRF is much smaller, this is because it's based on document rank that are typically close in value comparing to scores where delta can be huge.
Related Issues
Resolves #[Issue number to be closed when this PR is merged]
#865
#659
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.