Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed size python string cache #59

Merged
merged 19 commits into from
Mar 25, 2024
Merged

fixed size python string cache #59

merged 19 commits into from
Mar 25, 2024

Conversation

samuelcolvin
Copy link
Member

@samuelcolvin samuelcolvin commented Jan 20, 2024

Use a fully associative cache for strings, rather than a "simple" hashmap, when the number of strings exceeds the hashmap size (currently 500k), this gives a 48% performance improvements according to the 1m_strings python benchmark.

It's currently suffering from this was fixed by @davidhewitt.

thread 'test_python_cache_usage_all' has overflowed its stack
fatal runtime error: stack overflow

in python tests, this can be reproduced locally, and goes away if you remove the GILOnceCell, but that makes things slower (or less secure if you used constant seeds for the hasher).

Copy link

codecov bot commented Jan 20, 2024

Codecov Report

Merging #59 (e284786) into main (56db4fc) will decrease coverage by 0.08%.
The diff coverage is 94.56%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #59      +/-   ##
==========================================
- Coverage   96.35%   96.27%   -0.08%     
==========================================
  Files           8        9       +1     
  Lines        1234     1289      +55     
==========================================
+ Hits         1189     1241      +52     
- Misses         45       48       +3     
Files Coverage Δ
src/python.rs 96.52% <100.00%> (+0.59%) ⬆️
src/py_string_cache.rs 93.97% <93.97%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56db4fc...e284786. Read the comment docs.

Copy link

codspeed-hq bot commented Jan 20, 2024

CodSpeed Performance Report

Merging #59 will not alter performance

Comparing fixed-size-cache (e284786) with main (56db4fc)

Summary

✅ 56 untouched benchmarks

@samuelcolvin samuelcolvin force-pushed the fixed-size-cache branch 2 times, most recently from 3c76b61 to 06480b2 Compare January 21, 2024 15:28
@davidhewitt
Copy link
Collaborator

Pushed a tiny commit to use RandomState instead of BuildHasherDefault, this seems to be slightly faster and I think more secure anyway.

Copy link
Collaborator

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks great except for one concern about stack usage. It's a shame I can't see a way to allocate a boxed array on stable.

impl Default for PyStringCache {
fn default() -> Self {
Self {
entries: Box::new([ARRAY_REPEAT_VALUE; CAPACITY]),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, have a worry here that this line can easily trigger stack overflows. Think we want to avoid accidentally causing import pydantic to cause a stack overflow on some platform with less stack space.

Initial solution sounds like we need to reduce capacity and see how that goes.

@samuelcolvin samuelcolvin merged commit 979eab5 into main Mar 25, 2024
10 checks passed
@samuelcolvin samuelcolvin deleted the fixed-size-cache branch March 25, 2024 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants