Skip to content

Commit

Permalink
Shrink the size of wast text tokens (#1103)
Browse files Browse the repository at this point in the history
This commit is another improvement towards addressing #1095 where the
goal here is to shrink the size of `Token` and reduce the allocated
memory that it retains. Currently the entire input string is tokenized
and stored as a list of tokens for `Parser` to process. This means that
the size of a token has a large affect on the size of this vector for
large inputs.

Even before this commit tokens had been slightly optimized for size
where some variants were heap-allocated with a `Box`. In profiling with
DHAT, however, it appears that a large portion of peak memory was these
boxes, namely for integer/float tokens which appear quite a lot in many
inputs.

The changes in this commit were to:

* Shrink the size of `Token` to two words. This is done by removing all
  pointers from `Token` and instead only storing a `TokenKind` which is
  packed to 32-bits or less. Span information is still stored in a
  `Token`, however.

* With no more payload tokens which previously had a payload such as
  integers, strings, and floats are now re-parsed. They're sort of
  parsed once while lexing, then again when the token is interpreted
  later on. Some of this is fundamental where the parsing currently
  happens in a type-specific context but the context isn't known during
  lexing (e.g. if something is parsed as `u8` then that shouldn't accept
  `256` as input).

The hypothesis behind this is that tokens are far more often keywords,
whitespace, and comments rather than integers, strings, and floats. This
means that if these tokens require some extra work then that should
hopefully "come out in the wash" after and this representation would
otherwise allow for other speedups.

Locally the example in #1095 has a peak memory usage reduced from 5G to
4G from this commit and additionally the parsing time drops from 8.9s to
7.6s.
  • Loading branch information
alexcrichton authored Jul 7, 2023
1 parent 9b00ec1 commit 14b4818
Show file tree
Hide file tree
Showing 5 changed files with 481 additions and 341 deletions.
Loading

0 comments on commit 14b4818

Please sign in to comment.