-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makes sure KNNVectorValues aren't recreated unnecessarily when #2133
Makes sure KNNVectorValues aren't recreated unnecessarily when #2133
Conversation
@shatejas can you update the changelog |
@shatejas is there a way we can validate how many times we are calling those MergedFloatValues in the unit testing? |
61d3f71
to
0a280c1
Compare
@@ -34,7 +34,8 @@ public byte[] getVector() throws IOException { | |||
@Override | |||
public byte[] conditionalCloneVector() throws IOException { | |||
byte[] vector = getVector(); | |||
if (vectorValuesIterator.getDocIdSetIterator() instanceof ByteVectorValues) { | |||
if (vectorValuesIterator instanceof KNNVectorValuesIterator.MergeByteVectorValuesIterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we could add a wrapper method to unwrap so that we dont always have to check both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its trivial currently so will prefer keeping it as it is, if it gets complicated we can either make a predicate or stub it in a method
return getVectorValues(vectorDataType, new KNNVectorValuesIterator.FieldWriterIteratorValues<>(docIdWithFieldSet, vectors)); | ||
} | ||
|
||
public static <T> KNNVectorValues<T> getVectorValues( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: javadocs for all public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will address javadocs with unit test
|
||
public final class KNNMergeVectorValues { | ||
|
||
public static List<KNNVectorValuesSub<FloatVectorValues>> mergeFloatVectorValues(FieldInfo fieldInfo, MergeState mergeState) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
javadoc?
|
||
public static List<KNNVectorValuesSub<FloatVectorValues>> mergeFloatVectorValues(FieldInfo fieldInfo, MergeState mergeState) | ||
throws IOException { | ||
assert fieldInfo != null && fieldInfo.hasVectorValues(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure on this assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece was taken from lucene with minor edits so kept it as it is, any use case that need to be handled considering this is not used for BinaryDocValues?
|
||
private static int computeLiveDocs(final DocIdSetIterator values, Bits liveDocs) throws IOException { | ||
int count = 0; | ||
if (liveDocs != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cant we call length on live docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
length()
counts 0s so cannot use that. cardinality()
can only be obtained for FixedBitSet and that is handled before calling this
0a280c1
to
cdcb1eb
Compare
Separated out liveDocs optimization to #2135 |
quantization isn't needed Signed-off-by: Tejas Shah <[email protected]>
cdcb1eb
to
2d88cce
Compare
…zation isn't needed (#2133) Signed-off-by: Tejas Shah <[email protected]> (cherry picked from commit e33afa5)
The backport to
To backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.17 2.17
# Navigate to the new working tree
cd .worktrees/backport-2.17
# Create a new branch
git switch --create backport/backport-2133-to-2.17
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 e33afa5de5f8658ad7fbe71125707436e81cc5b8
# Push it to GitHub
git push --set-upstream origin backport/backport-2133-to-2.17
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.17 Then, create a pull request where the |
…zation isn't needed (opensearch-project#2133) Signed-off-by: Tejas Shah <[email protected]> (cherry picked from commit e33afa5)
…zation isn't needed (opensearch-project#2133) Signed-off-by: Tejas Shah <[email protected]> (cherry picked from commit e33afa5)
…zation isn't needed (opensearch-project#2133) Signed-off-by: Tejas Shah <[email protected]> (cherry picked from commit e33afa5)
…zation isn't needed (#2133) (#2140) Signed-off-by: Tejas Shah <[email protected]> (cherry picked from commit e33afa5)
…n quantization isn't needed (opensearch-project#2133) (opensearch-project#2140)" This reverts commit ca6b03f.
…n quantization isn't needed (opensearch-project#2133) (opensearch-project#2140)" This reverts commit ca6b03f. Signed-off-by: Naveen Tatikonda <[email protected]>
…n quantization isn't needed (#2133) (#2140)" (#2161) This reverts commit ca6b03f. Signed-off-by: Naveen Tatikonda <[email protected]>
…en quantization isn't needed (opensearch-project#2133) (opensearch-project#2140)" (opensearch-project#2161) This reverts commit b0d82b7. Signed-off-by: Navneet Verma <[email protected]>
quantization isn't needed
Description
An attempt to address the regression seen in force-merge time for cohere-10m dataset.
KNNVectorValues creation is optimized to skip for non-quantization cases
The perf for force merge showed improvements in 1m dataset
Related Issues
#2134
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.