Shrink the size of wast text tokens (#1103) · bytecodealliance/wasm-tools@14b4818

Commit

Shrink the size of wast text tokens (#1103)

This commit is another improvement towards addressing #1095 where the
goal here is to shrink the size of `Token` and reduce the allocated
memory that it retains. Currently the entire input string is tokenized
and stored as a list of tokens for `Parser` to process. This means that
the size of a token has a large affect on the size of this vector for
large inputs.

Even before this commit tokens had been slightly optimized for size
where some variants were heap-allocated with a `Box`. In profiling with
DHAT, however, it appears that a large portion of peak memory was these
boxes, namely for integer/float tokens which appear quite a lot in many
inputs.

The changes in this commit were to:

* Shrink the size of `Token` to two words. This is done by removing all
  pointers from `Token` and instead only storing a `TokenKind` which is
  packed to 32-bits or less. Span information is still stored in a
  `Token`, however.

* With no more payload tokens which previously had a payload such as
  integers, strings, and floats are now re-parsed. They're sort of
  parsed once while lexing, then again when the token is interpreted
  later on. Some of this is fundamental where the parsing currently
  happens in a type-specific context but the context isn't known during
  lexing (e.g. if something is parsed as `u8` then that shouldn't accept
  `256` as input).

The hypothesis behind this is that tokens are far more often keywords,
whitespace, and comments rather than integers, strings, and floats. This
means that if these tokens require some extra work then that should
hopefully "come out in the wash" after and this representation would
otherwise allow for other speedups.

Locally the example in #1095 has a peak memory usage reduced from 5G to
4G from this commit and additionally the parsing time drops from 8.9s to
7.6s.

Loading branch information

alexcrichton authored Jul 7, 2023

1 parent 9b00ec1 commit 14b4818

0 comments on commit `14b4818`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `14b4818`

Commit

There are no files selected for viewing

0 comments on commit 14b4818

0 comments on commit `14b4818`