Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF]: better locking of uncommitted tracking maps (decrease compaction time by 3x) #2736

Merged
merged 5 commits into from
Aug 29, 2024

Conversation

codetheweb
Copy link
Contributor

@codetheweb codetheweb commented Aug 29, 2024

Description of changes

Instead of acquiring 2 locks for every token, we now acquire the locks at the document level. This improves compaction time by up to 3x.

I ran an experiment where I:

  1. Added 15,000 records
  2. Compacted the records
  3. Added 9,600 records
  4. Compacted the records

The partition size was 10,000.

Results (also includes changes from #2729 up to 14a744b):

Before:

  • First compaction: 110s
  • Second compaction: 54s

After:

  • First compaction: 33s
  • Second compaction: 40s

Here's a trace file for the two runs: locking-changes.trace.zip.

(Run 1 is after these changes, run 2 is before these changes.)

Test plan

How are these changes tested?

covered by existing tests

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

n/a

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@@ -92,60 +93,71 @@ impl<'me> FullTextIndexWriter<'me> {

async fn populate_frequencies_and_posting_lists_from_previous_version(
&self,
token: &str,
tokens: &Vec<Token>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is slightly weird, but the tantivy token type is already leaking out of our abstraction and this avoids having to remap the tokens to strs
not strongly opinionated here, ok going the remapping route

@codetheweb codetheweb marked this pull request as ready for review August 29, 2024 00:07
tracing::error!(
"Error populating frequencies and posting lists from previous version"
);
return Err(FullTextIndexError::InvariantViolation);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add an extra Vec to track seen tokens in the above loop so we can retain this error if it's important

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine, not worth the extra overhead

Copy link
Contributor

@sanketkedia sanketkedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking care of the comments!

@codetheweb codetheweb merged commit b491c2f into main Aug 29, 2024
67 checks passed
@codetheweb codetheweb deleted the feat-better-uncommitted-map-locking branch August 29, 2024 19:45
spikechroma pushed a commit that referenced this pull request Sep 12, 2024
spikechroma pushed a commit that referenced this pull request Sep 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants