Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Documentation: Correct Segment Merging Description and Parameters in Qdrant #1131

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

iamdorsa
Copy link

The Qdrant documentation incorrectly describes a parameter named max_segment_number under the "Merge Optimizer" section, suggesting that this parameter controls the merging of small segments when their number exceeds a certain threshold. However, this parameter does not exist in the Qdrant codebase. Instead, the correct approach to managing segment size and merging is through the max_segment_size parameter, which determines the maximum size of a segment.

This discrepancy may have caused confusion among users trying to optimize their Qdrant instances, leading to ineffective configurations and potentially suboptimal performance.

Copy link

netlify bot commented Aug 27, 2024

Deploy Preview for condescending-goldwasser-91acf0 ready!

Name Link
🔨 Latest commit 1ab170f
🔍 Latest deploy log https://app.netlify.com/sites/condescending-goldwasser-91acf0/deploys/66ce245890e7130008810373
😎 Deploy Preview https://deploy-preview-1131--condescending-goldwasser-91acf0.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@iamdorsa iamdorsa changed the title Master Fix Documentation: Correct Segment Merging Description and Parameters in Qdrant Aug 27, 2024
Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this!

While this is not entirely true, you're definitely right that max_segment_number is not a thing anymore.

Please see my comments below:

Comment on lines +62 to +63
# This parameter defines the maximum size of a segment. Increasing this value can help in reducing the number of segments.
max_segment_size: <desired_size>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want show what we have in our configuration file here, along with the default_segment_number:

Suggested change
# This parameter defines the maximum size of a segment. Increasing this value can help in reducing the number of segments.
max_segment_size: <desired_size>
# Target amount of segments optimizer will try to keep.
# Real amount of segments may vary depending on multiple parameters:
# - Amount of stored points
# - Current write RPS
#
# It is recommended to select default number of segments as a factor of the number of search threads,
# so that each segment would be handled evenly by one of the threads.
# If `default_segment_number = 0`, will be automatically selected by the number of available CPUs
default_segment_number: 0
# Do not create segments larger this size (in KiloBytes).
# Large segments might require disproportionately long indexation times,
# therefore it makes sense to limit the size of segments.
#
# If indexation speed have more priority for your - make this parameter lower.
# If search speed is more important - make this parameter higher.
# Note: 1Kb = 1 vector of size 256
# If not set, will be automatically selected considering the number of available CPUs.
max_segment_size_kb: null

@@ -50,7 +50,7 @@ Such segments, for example, are created as copy-on-write segments during optimiz
It is also essential to have at least one small segment that Qdrant will use to store frequently updated data.
On the other hand, too many small segments lead to suboptimal search performance.

There is the Merge Optimizer, which combines the smallest segments into one large segment. It is used if too many segments are created.
Qdrant uses a parameter called max_segment_size to control the size of segments. Increasing this value allows the creation of larger segments, reducing the number of segments and potentially improving search performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not entirely true.

The merge optimizer primarily looks at default_segment_number but also takes max_segment_size into account.

In other words, we:

  • merging segments when we have more segments than the specified default
  • but only if it doesn't exceed the maximum specified side

The logic can be found here: https://github.com/qdrant/qdrant/blob/42e08ea6506d3d1fdd2ddd9be4ae23faba623be8/lib/collection/src/collection_manager/optimizers/merge_optimizer.rs#L88-L143

Could you update this section accordingly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will update it as soon as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much! 😄

I did notice we still had some mentions in the Qdrant repository. I've fixed this yesterday: qdrant/qdrant#5058

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the Merge Optimizer doc at this link hasn't been updated yet. Could you let me know when the change will be applied?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants