Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More small parse performance improvements #229

Merged

Conversation

lue-bird
Copy link
Contributor

@lue-bird lue-bird commented Jul 19, 2024

running the benchmarks with eol2 --optimize-speed in chromium

  • before: 81955.5 ms to do 100 runs => 819.555 ms/run
  • after: 42913.90000000596 ms to do 100 runs => 429.1390000000596 ms/run

(the improvements are pretty consistent now)

Done by

  • moving let declarations to module level if it requires no changes
  • changing succeed () |> map (\() -> x) to lazy (\() -> succeed x)
  • changing succeed a |. x to map (\() -> a) x
  • small optimizations for e.g. how string literal parts are folded
  • ignore reserved words for type names
  • change from Dict for operations to case (and actually pre-compute available operations for all precedences)
  • shortcutting lower/uppercase character checks
  • faster layout primitives

Not all changes are necessarily performance related. Some change ignored unit arguments to \() ->. One fixes import alias parsing.

@lue-bird lue-bird marked this pull request as draft July 19, 2024 17:16
)
, Core.lazy
(\() ->
Core.succeed
Copy link
Collaborator

@jfmengels jfmengels Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I wonder if this is faster. As far as I know Parser.lazy is only there to avoid cyclic references between parsers, not to improve performance. Maybe it does as well but I'm not sure 🤔

Have you run benchmarks on this change alone, or only on the whole PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map needs to check whether it's valid or not underneath, so this kind of makes sense to me

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, that's a good point. But to do that it also re-defines the parser everytime it gets evaluated, whereas with map the parser is already defined, and (from prior optimizations) that was quite bad for performance.

I really don't know what is faster though. I'm okay with taking this change, and then benchmarking this change to know what is faster, because this would be a cool learning.

@lue-bird lue-bird marked this pull request as ready for review July 20, 2024 03:17
src/Elm/Parser/Base.elm Outdated Show resolved Hide resolved

operation100 : Node Expression -> Parser State (Node Expression)
operation100 leftExpression =
operation 100 leftExpression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much faster is this? Because it seems a little excessive :)

Copy link
Contributor Author

@lue-bird lue-bird Jul 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, benchmarking rn. There are likely more low-hanging fruits (e.g. the List.map inside oneOf there that can be moved into the filterMap one level higher or moving laziness to only the parts that need it instead of the whole parser)

Copy link
Contributor Author

@lue-bird lue-bird Jul 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

benchmarks show predefining these doesn't make a difference, you're right.
A bit strange though, because that means using a Dict or case is unnecessary...
Maybe there is a way to go through the filterMap early, should be, right? benchmarking. . .

Yeah, early filterMap is a bit faster, consistently about 5% (I wonder why only a bit)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way it's done right now, we're not actually precomputing anything, and instead recomputing (filter the precedences) every time. I'm slightly in doubt as to whether precomputation worked before this PR either, though I at some believed I had made it work.

If we got rid of the expression argument in operator (probably not possible?), or if we did the precedence filtering and then returned a lambda that took the expression argument, then it would work for sure.

Copy link
Contributor Author

@lue-bird lue-bird Jul 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you looking at the latest state where we have module-level filtered lists?
And I don't think it worked in the Dict version because the filterMap is inside the constructed lambda for each entry. But I haven't tested so who knows

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what do you mean with "module-level filtered lists"?
And yeah, I don't think it worked with Dict either, although at the time I did think so. I'll look into it at some point again.

@lue-bird
Copy link
Contributor Author

lue-bird commented Jul 20, 2024

We could do some more random things. For example for

Core.oneOf
    [ Tokens.multiLineStringLiteral
    , Tokens.stringLiteral
    ]

we could share the first " at the cost of readability but giving us a tiny performance improvement.
(edit: the performance improvement is negligible, so I'm not doing this at the cost of readability. Others, feel free to apply this anyway)

Same in

Combine.oneOf
    [ functionWithSignature pointer
    , functionWithoutSignature pointer
    ]

where technically both share the declared name at the start, so if the first one fails it shouldn't need to be parsed again, or fail both directly if the name is invalid.
Would changes like that be acceptable?

Another thing: keyword (which looks ahead for whitespace) is used for tokens that always have layout after them (e.g. as in an import). Should we switch to token / symbol there? Actually so far I've only seen keyword being used with correct layout after.
Edit: Benchmarked, the extra lookahead doesn't seem to matter at all (about 0.5%), so this would only be a stylistic change. If we switch, symbol actually performs ever so slightly better than token so we should use that.

Another thing: I have a commit ready for replacing record type alias constructor calls with fields. Should I add this here or are there other plans (like: "keep them because records might get replaced by variants in v8")

Another thing: Core.variable is faster than symbol |. chompWhile
image
The pain is immeasurable and my day is ruined (edit: day not ruined quite yet, seemed to be a fluke as they are roughly equal now, edit edit: used the wrong sample data, day is ruined again)


Since most of my suggestions come from small benchmarks + guesswork on tiny things, do you use any tools to look at performance at the bigger picture? The in-browser profiler only shows thousands of oneOfs and anonymous functions all the way down which is plenty useless.

Another btw: eol2 doesn't seem to optimize List.filterMap but maybe I'm doing something wrong.

@jfmengels
Copy link
Collaborator

List.filterMap is optimized in a non-merged PR (mdgriffith/elm-optimize-level-2#75). But it does get optimized in elm-review (which is not using eol2, just doing a few similar patches).

Random optimization idea as well (I'll reply to your other questions tomorrow morning): we do a LOT of string concatenations when parsing string literals. But most of the time we can get the contents through a simple get chompedstring. The only times we can't do that is when there's escaping (which is not the common case) and even then we could probably use getChompedString for the rest of the strings, leading to a lot less concatenations.

@lue-bird
Copy link
Contributor Author

lue-bird commented Jul 20, 2024

I don't think this will improve anything because literals that don't contain escaped characters are already parsed fast by chompWhile. So yeah in practice we actually don't do "a LOT of string concatenations when parsing string literals".

Keep the random ideas flowing.

@jfmengels
Copy link
Collaborator

Yeah, we already do that, I only remembered the part where we add one character. Too bad 😅
(And I had to braindump that before going to bed, didn't have time to re-check)

we could share the first " at the cost of readability but giving us a tiny performance improvement.

I think that's a good idea. I'm honestly confused as to how this even works, because my mental model says that the first " character should cause multimultiLineStringLiteral to commit and therefore cause to not be evaluated.

For the other location (functionWithSignature, ...), I'm less sure. I think it would be good to have, but at least the way we often do things is by passing the argument (through andThen or map to a new function which redefines. At this point, I believe that having parser defined as constants leads to faster code, therefore redefining the parser inside of a function leads to slower code. The pointer argument is something that I'd like to remove already, and I wonder if sharing the parser for the type name is not going to make that even harder.

Re:keyword

I don't yet understand the differences between symbol/token/keyword. I'm fine with changing to the fastest one. And maybe having some guidance written down somewhere to explain which one to use in general (or in which case).

I have a commit ready for replacing record type alias constructor calls with fields. Should I add this here or are there other plans (like: "keep them because records might get replaced by variants in v8")

In terms of breaking changes, I think it's fine. Do you know if it's faster? It removes an intermediary function that does the same thing, but maybe that function would get optimized by the JIT. Those are the questions I pondered and didn't know the answer yet.

do you use any tools to look at performance at the bigger picture?

I don't have any useful tool for this unfortunately, just individual smaller benchmarks like the ones you made (but which I haven't used recently, do you have one somewhere that I could use as a starting point?) and @jiegillet's 100 run benchmark.

Core.variable is faster than symbol |. chompWhile

I'm confused as to the final results for this, what is finally the fastest? And is this a pattern we use often? I only see it in many1Spaces (which is ran very often yes). And 🫂 for your ruined day.

@lue-bird
Copy link
Contributor Author

lue-bird commented Jul 21, 2024

Core.variable is faster than symbol |. chompWhile: I'm confused as to the final results for this, what is finally the fastest?

Core.variable is consistently ~15% faster than with symbol |. chompWhile. I imagine it could even be faster than regular chompWhile (with oneOf [ variable, succeed "" ]) but that I haven't verified. (edit: not conclusive, but oneOf appears to be slightly slower)

do you have one somewhere that I could use as a starting point? and jiegillet's 100 run benchmark.

I pushed my benchmark code to this branch: https://github.com/lue-bird/elm-syntax/tree/specific-and-full-benchmarks/benchmark

I don't yet understand the differences between symbol/token/keyword

The doc explains it well: symbol = token and keyword only parses successfully if the character after the symbol is whitespace. That's really useful to e.g. not think that

importStatement = ...

should be committed as an import. (would be equivalent to something like (symbol |> backtrackable) |. (singleWhitespace |. commit ()) but is faster)

I'm honestly confused as to how this even works, because my mental model says that the first " character should cause multimultiLineStringLiteral to commit and therefore cause to not be evaluated.

That's the cool thing about symbol "\"\"\"". It only commits if the whole thing parses successfully.


Another good improvement by checking for Latin first letters first:
image
I'm a tiny bit sad unicode checks seem to be somewhat slow even though they contain shortcuts themselves.

Another random idea:
We can ""simplify"" all succeed-ignore/keep pipelines like so

Core.succeed identity
    |. Core.symbol lp
    |= p state
    |. Core.symbol rp
-->
Core.map (\() -> identity)
    (Core.symbol lp)
    |= p state
    |. Core.symbol rp

(same with Combine.succeed- Combine.ignore/Combine.keep)
This is uglier. At which level of desperation should we consider this or is that too awful?

A fun experience: I had a shock moment after changing a oneOf order where something fishy was happening (consistently)
image
Changing one token to keyword and rebuilding gave the usual times, so all is good after all. AFAIK it was some variable starting with case or let making elm-syntax fail.

A failed experiment: I created a module-level alias for Combine.maybeIgnore Layout.layout and replaced all instances (some in andThen). Result: inconclusive, strangely.

@jfmengels jfmengels force-pushed the more-small-parse-performance-improvements branch from 5b6a835 to b70e7a0 Compare July 22, 2024 20:11
@jfmengels
Copy link
Collaborator

This has been amazing @lue-bird, I can't believe you've pretty much made the parser twice as fast 🤯

I don't see any reason not to merge this as it is. Feel free to continue the performance improvements in new PRs, if you still have the performance itch 😄

(Note: I've cleaned up the branch and force-pushed it to merge it nicely.)

@jfmengels jfmengels merged commit b70e7a0 into stil4m:master Jul 22, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants