Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
The
parser
benchmark has become flaky after merging #9466.A possible reason for the flakiness could be some non-determinism in jemalloc that makes it execute different branches while collecting the tokens into a
Vec
.The idea of this PR is to approximate the size of the tokens$buffer/source$ distribution and picked 0.15 as a lower bound for the size of the tokens vec.
Vec
based on the source code length. The approximation used here is based on an analysis of the CPython code base and our ecosystem projects (including aiflow, black, django, transformers, twine, warehouse, zulip). I added a console output that prints the source code length and resultingtokens
vec size for each file and piped it into a CSV. I then used a small python script to paint theEcosystem
CPython
Source
Test Plan
I expected some real world perf improvements from this work but neither our
hyperfine
benchmarks nor the parser's microbenchmarks show any real improvement 🤷