More small parse performance improvements #229

lue-bird · 2024-07-19T17:13:13Z

running the benchmarks with eol2 --optimize-speed in chromium

before: 81955.5 ms to do 100 runs => 819.555 ms/run
after: 42913.90000000596 ms to do 100 runs => 429.1390000000596 ms/run

(the improvements are pretty consistent now)

Done by

moving let declarations to module level if it requires no changes
changing succeed () |> map (\() -> x) to lazy (\() -> succeed x)
changing succeed a |. x to map (\() -> a) x
small optimizations for e.g. how string literal parts are folded
ignore reserved words for type names
change from Dict for operations to case (and actually pre-compute available operations for all precedences)
shortcutting lower/uppercase character checks
faster layout primitives

Not all changes are necessarily performance related. Some change ignored unit arguments to \() ->. One fixes import alias parsing.

…cope

jfmengels · 2024-07-19T17:42:31Z

src/Combine.elm

-                        )
+                , Core.lazy
+                    (\() ->
+                        Core.succeed


Hmm, I wonder if this is faster. As far as I know Parser.lazy is only there to avoid cyclic references between parsers, not to improve performance. Maybe it does as well but I'm not sure 🤔

Have you run benchmarks on this change alone, or only on the whole PR?

map needs to check whether it's valid or not underneath, so this kind of makes sense to me

Oh right, that's a good point. But to do that it also re-defines the parser everytime it gets evaluated, whereas with map the parser is already defined, and (from prior optimizations) that was quite bad for performance.

I really don't know what is faster though. I'm okay with taking this change, and then benchmarking this change to know what is faster, because this would be a cool learning.

src/Elm/Parser/Tokens.elm

src/Elm/Parser/Expression.elm

src/Elm/Parser/Tokens.elm

src/Elm/Parser/Patterns.elm

src/Elm/Parser/Base.elm

jiegillet · 2024-07-20T11:58:37Z

src/Elm/Parser/Expression.elm

+
+operation100 : Node Expression -> Parser State (Node Expression)
+operation100 leftExpression =
+    operation 100 leftExpression


How much faster is this? Because it seems a little excessive :)

indeed, benchmarking rn. There are likely more low-hanging fruits (e.g. the List.map inside oneOf there that can be moved into the filterMap one level higher or moving laziness to only the parts that need it instead of the whole parser)

benchmarks show predefining these doesn't make a difference, you're right.
A bit strange though, because that means using a Dict or case is unnecessary...
Maybe there is a way to go through the filterMap early, should be, right? benchmarking. . .

Yeah, early filterMap is a bit faster, consistently about 5% (I wonder why only a bit)

I think the way it's done right now, we're not actually precomputing anything, and instead recomputing (filter the precedences) every time. I'm slightly in doubt as to whether precomputation worked before this PR either, though I at some believed I had made it work.

If we got rid of the expression argument in operator (probably not possible?), or if we did the precedence filtering and then returned a lambda that took the expression argument, then it would work for sure.

are you looking at the latest state where we have module-level filtered lists?
And I don't think it worked in the Dict version because the filterMap is inside the constructed lambda for each entry. But I haven't tested so who knows

Sorry, what do you mean with "module-level filtered lists"?
And yeah, I don't think it worked with Dict either, although at the time I did think so. I'll look into it at some point again.

lue-bird · 2024-07-20T16:49:06Z

We could do some more random things. For example for

Core.oneOf
    [ Tokens.multiLineStringLiteral
    , Tokens.stringLiteral
    ]

we could share the first " at the cost of readability but giving us a tiny performance improvement.
(edit: the performance improvement is negligible, so I'm not doing this at the cost of readability. Others, feel free to apply this anyway)

Same in

Combine.oneOf
    [ functionWithSignature pointer
    , functionWithoutSignature pointer
    ]

where technically both share the declared name at the start, so if the first one fails it shouldn't need to be parsed again, or fail both directly if the name is invalid.
Would changes like that be acceptable?

Another thing: keyword (which looks ahead for whitespace) is used for tokens that always have layout after them (e.g. as in an import). Should we switch to token / symbol there? Actually so far I've only seen keyword being used with correct layout after.
Edit: Benchmarked, the extra lookahead doesn't seem to matter at all (about 0.5%), so this would only be a stylistic change. If we switch, symbol actually performs ever so slightly better than token so we should use that.

Another thing: I have a commit ready for replacing record type alias constructor calls with fields. Should I add this here or are there other plans (like: "keep them because records might get replaced by variants in v8")

Another thing: Core.variable is faster than symbol |. chompWhile

The pain is immeasurable and my day is ruined (edit: day not ruined quite yet, seemed to be a fluke as they are roughly equal now, edit edit: used the wrong sample data, day is ruined again)

Since most of my suggestions come from small benchmarks + guesswork on tiny things, do you use any tools to look at performance at the bigger picture? The in-browser profiler only shows thousands of oneOfs and anonymous functions all the way down which is plenty useless.

Another btw: eol2 doesn't seem to optimize List.filterMap but maybe I'm doing something wrong.

jfmengels · 2024-07-20T23:14:11Z

List.filterMap is optimized in a non-merged PR (mdgriffith/elm-optimize-level-2#75). But it does get optimized in elm-review (which is not using eol2, just doing a few similar patches).

Random optimization idea as well (I'll reply to your other questions tomorrow morning): we do a LOT of string concatenations when parsing string literals. But most of the time we can get the contents through a simple get chompedstring. The only times we can't do that is when there's escaping (which is not the common case) and even then we could probably use getChompedString for the rest of the strings, leading to a lot less concatenations.

lue-bird · 2024-07-20T23:48:57Z

I don't think this will improve anything because literals that don't contain escaped characters are already parsed fast by chompWhile. So yeah in practice we actually don't do "a LOT of string concatenations when parsing string literals".

Keep the random ideas flowing.

jfmengels · 2024-07-21T09:33:00Z

Yeah, we already do that, I only remembered the part where we add one character. Too bad 😅
(And I had to braindump that before going to bed, didn't have time to re-check)

we could share the first " at the cost of readability but giving us a tiny performance improvement.

I think that's a good idea. I'm honestly confused as to how this even works, because my mental model says that the first " character should cause multimultiLineStringLiteral to commit and therefore cause to not be evaluated.

For the other location (functionWithSignature, ...), I'm less sure. I think it would be good to have, but at least the way we often do things is by passing the argument (through andThen or map to a new function which redefines. At this point, I believe that having parser defined as constants leads to faster code, therefore redefining the parser inside of a function leads to slower code. The pointer argument is something that I'd like to remove already, and I wonder if sharing the parser for the type name is not going to make that even harder.

Re:keyword

I don't yet understand the differences between symbol/token/keyword. I'm fine with changing to the fastest one. And maybe having some guidance written down somewhere to explain which one to use in general (or in which case).

I have a commit ready for replacing record type alias constructor calls with fields. Should I add this here or are there other plans (like: "keep them because records might get replaced by variants in v8")

In terms of breaking changes, I think it's fine. Do you know if it's faster? It removes an intermediary function that does the same thing, but maybe that function would get optimized by the JIT. Those are the questions I pondered and didn't know the answer yet.

do you use any tools to look at performance at the bigger picture?

I don't have any useful tool for this unfortunately, just individual smaller benchmarks like the ones you made (but which I haven't used recently, do you have one somewhere that I could use as a starting point?) and @jiegillet's 100 run benchmark.

Core.variable is faster than symbol |. chompWhile

I'm confused as to the final results for this, what is finally the fastest? And is this a pattern we use often? I only see it in many1Spaces (which is ran very often yes). And 🫂 for your ruined day.

lue-bird · 2024-07-21T13:40:34Z

Core.variable is faster than symbol |. chompWhile: I'm confused as to the final results for this, what is finally the fastest?

Core.variable is consistently ~15% faster than with symbol |. chompWhile. I imagine it could even be faster than regular chompWhile (with oneOf [ variable, succeed "" ]) but that I haven't verified. (edit: not conclusive, but oneOf appears to be slightly slower)

do you have one somewhere that I could use as a starting point? and jiegillet's 100 run benchmark.

I pushed my benchmark code to this branch: https://github.com/lue-bird/elm-syntax/tree/specific-and-full-benchmarks/benchmark

I don't yet understand the differences between symbol/token/keyword

The doc explains it well: symbol = token and keyword only parses successfully if the character after the symbol is whitespace. That's really useful to e.g. not think that

importStatement = ...

should be committed as an import. (would be equivalent to something like (symbol |> backtrackable) |. (singleWhitespace |. commit ()) but is faster)

I'm honestly confused as to how this even works, because my mental model says that the first " character should cause multimultiLineStringLiteral to commit and therefore cause to not be evaluated.

That's the cool thing about symbol "\"\"\"". It only commits if the whole thing parses successfully.

Another good improvement by checking for Latin first letters first:

I'm a tiny bit sad unicode checks seem to be somewhat slow even though they contain shortcuts themselves.

Another random idea:
We can ""simplify"" all succeed-ignore/keep pipelines like so

Core.succeed identity
    |. Core.symbol lp
    |= p state
    |. Core.symbol rp
-->
Core.map (\() -> identity)
    (Core.symbol lp)
    |= p state
    |. Core.symbol rp

(same with Combine.succeed- Combine.ignore/Combine.keep)
This is uglier. At which level of desperation should we consider this or is that too awful?

A fun experience: I had a shock moment after changing a oneOf order where something fishy was happening (consistently)

Changing one token to keyword and rebuilding gave the usual times, so all is good after all. AFAIK it was some variable starting with case or let making elm-syntax fail.

A failed experiment: I created a module-level alias for Combine.maybeIgnore Layout.layout and replaced all instances (some in andThen). Result: inconclusive, strangely.

src/Elm/Parser/Declarations.elm

…lifiedPatternArgumentList

…me parsing

…re less likely than type and type alias

jfmengels · 2024-07-22T20:11:33Z

This has been amazing @lue-bird, I can't believe you've pretty much made the parser twice as fast 🤯

I don't see any reason not to merge this as it is. Feel free to continue the performance improvements in new PRs, if you still have the performance itch 😄

(Note: I've cleaned up the branch and force-pushed it to merge it nicely.)

lue-bird added 4 commits July 19, 2024 17:30

switch from succeed () |> map to lazy (\() -> succeed)

956a6e0

replace always by \() ->

812f629

remove duplication in characterLiteral parser

b9c6753

move let declarations that don't rely on inner parameters to module s…

a43b8ba

…cope

lue-bird marked this pull request as draft July 19, 2024 17:16

lue-bird added 3 commits July 19, 2024 19:21

convert \() -> f () to f

23e9716

convert Combine.succeed () |> map to lazy (\() -> succeed)

e70847a

convert >> to lambda

8cc1c84