-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Documentation: Correct Segment Merging Description and Parameters in Qdrant #1131
base: master
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for condescending-goldwasser-91acf0 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this!
While this is not entirely true, you're definitely right that max_segment_number
is not a thing anymore.
Please see my comments below:
# This parameter defines the maximum size of a segment. Increasing this value can help in reducing the number of segments. | ||
max_segment_size: <desired_size> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want show what we have in our configuration file here, along with the default_segment_number
:
# This parameter defines the maximum size of a segment. Increasing this value can help in reducing the number of segments. | |
max_segment_size: <desired_size> | |
# Target amount of segments optimizer will try to keep. | |
# Real amount of segments may vary depending on multiple parameters: | |
# - Amount of stored points | |
# - Current write RPS | |
# | |
# It is recommended to select default number of segments as a factor of the number of search threads, | |
# so that each segment would be handled evenly by one of the threads. | |
# If `default_segment_number = 0`, will be automatically selected by the number of available CPUs | |
default_segment_number: 0 | |
# Do not create segments larger this size (in KiloBytes). | |
# Large segments might require disproportionately long indexation times, | |
# therefore it makes sense to limit the size of segments. | |
# | |
# If indexation speed have more priority for your - make this parameter lower. | |
# If search speed is more important - make this parameter higher. | |
# Note: 1Kb = 1 vector of size 256 | |
# If not set, will be automatically selected considering the number of available CPUs. | |
max_segment_size_kb: null |
@@ -50,7 +50,7 @@ Such segments, for example, are created as copy-on-write segments during optimiz | |||
It is also essential to have at least one small segment that Qdrant will use to store frequently updated data. | |||
On the other hand, too many small segments lead to suboptimal search performance. | |||
|
|||
There is the Merge Optimizer, which combines the smallest segments into one large segment. It is used if too many segments are created. | |||
Qdrant uses a parameter called max_segment_size to control the size of segments. Increasing this value allows the creation of larger segments, reducing the number of segments and potentially improving search performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not entirely true.
The merge optimizer primarily looks at default_segment_number
but also takes max_segment_size
into account.
In other words, we:
- merging segments when we have more segments than the specified default
- but only if it doesn't exceed the maximum specified side
The logic can be found here: https://github.com/qdrant/qdrant/blob/42e08ea6506d3d1fdd2ddd9be4ae23faba623be8/lib/collection/src/collection_manager/optimizers/merge_optimizer.rs#L88-L143
Could you update this section accordingly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will update it as soon as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much! 😄
I did notice we still had some mentions in the Qdrant repository. I've fixed this yesterday: qdrant/qdrant#5058
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed the Merge Optimizer doc at this link hasn't been updated yet. Could you let me know when the change will be applied?
The Qdrant documentation incorrectly describes a parameter named max_segment_number under the "Merge Optimizer" section, suggesting that this parameter controls the merging of small segments when their number exceeds a certain threshold. However, this parameter does not exist in the Qdrant codebase. Instead, the correct approach to managing segment size and merging is through the max_segment_size parameter, which determines the maximum size of a segment.
This discrepancy may have caused confusion among users trying to optimize their Qdrant instances, leading to ineffective configurations and potentially suboptimal performance.