Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: handwritten parser #6180

Merged
merged 235 commits into from
Oct 7, 2024
Merged

feat: handwritten parser #6180

merged 235 commits into from
Oct 7, 2024

Conversation

asterite
Copy link
Collaborator

@asterite asterite commented Sep 30, 2024

Description

Resolves #853

Problem

There are some issues with the current parser:

  • it leads to "stack overflow" with small programs that should probably compile (like a program with a chanin of 17 if-else-if)
  • it leads to some of us not being able to run the noric_frontend tests because of linker errors, and sometimes making changes to the parser will lead to linker errors that we have to workaround
  • it's (very) slow

Summary

This PR implements a hand-written parser. It parses any program with just one look-ahead token. It has very good error-recovery.

I tested the parser's performance by copying the contents of noir-contracts/contracts/avm_test_contract/src/main.nr in Aztec-Packages 100 times to a single file. That ends up with a file of about 57K lines. The times:

  • chumsky parser: 1.52 seconds
  • handwritten parser: 52.97 milliseconds

Some other benefits:

  • The linker errors are gone!
  • Compiling noirc_frontend is slightly faster
  • Macro code also becomes faster (quote { ... }.as_expr(), etc, invoke the parser). For example test_programs/noir_test_success/comptime_expr/src/main.nr takes around one second with chumsky and 140ms with the handwritten parser (to do nargo compile)
  • Even though the parser is handwritten, I think the parsing code is relatively simple. It's just "check if we get this token, then do this" or sometimes "check if we get this token followed by this one (or not followed by this one). Also also the impls and traits that we needed for chumsky (and the lifetimes, and passing parsers around, and cloning them, and calling boxed(), etc.) are gone, which I believe make the code much simpler. That said, chumsky has great helpers to be able to parse things separated by, say, a comma, and this PR at least has that too (parse_many).
  • Compiling an empty program is faster (goes from 650ms to 140ms)
  • Compiing any program is much faster
  • Tests run faster (it would become feasible to run tests locally before pushing to CI to avoid CI cycles):
    • Running noirc_frontend tests:
      • before: 1:03 minute
      • after: 6 seconds
    • Running lsp tests:
      • before: 55 seconds
      • after: 6 seconds
    • Running nargo_cli tests:
      • before: 2:47 minutes
      • after: 38 seconds
  • CI runs faster (for example each of the four partitions take 1 minute instead of 4 minutes
  • Building the compiler is faster:
    • before: 1:29 minutes
    • after: 1:19 minutes (so building noirc_frontend is 10 seconds faster because that's the only thing changed in this PR)
  • Better parsing recovery and more fine-grained control over the errors we report

I tested this parser by running ./boostrap.sh on the Aztec-Packages contracts and they compile file (of course they didn't compile right away, I had to fix some bugs to get there).

Additional Context

Documentation

Check one:

  • No documentation needed.
  • Documentation included in this PR.
  • [For Experimental Features] Documentation to be submitted in a separate PR.

PR Checklist

  • I have tested the changes locally.
  • I have formatted the changes with Prettier and/or cargo fmt on default settings.

@asterite
Copy link
Collaborator Author

asterite commented Oct 7, 2024

Approved! 🤩

Thank you!

I'll merge this... 🤞

Worse case scenario we can revert it, or fix things if we find them 😄

@asterite asterite added this pull request to the merge queue Oct 7, 2024
Merged via the queue into master with commit c4273a0 Oct 7, 2024
49 of 50 checks passed
@asterite asterite deleted the ab/parser branch October 7, 2024 18:17
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Oct 7, 2024
feat(test): Fuzz test stdlib hash functions (noir-lang/noir#6233)
fix: handle nested arrays in calldata (noir-lang/noir#6232)
feat: add more `Type` and `UnresolvedType` methods (noir-lang/noir#5994)
fix: Panic on composite types within databus (noir-lang/noir#6225)
feat(perf): Follow array sets backwards in array set from get optimization (noir-lang/noir#6208)
fix: check for Schnorr null signature (noir-lang/noir#6226)
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Oct 7, 2024
feat(test): Fuzz test stdlib hash functions (noir-lang/noir#6233)
fix: handle nested arrays in calldata (noir-lang/noir#6232)
feat: add more `Type` and `UnresolvedType` methods (noir-lang/noir#5994)
fix: Panic on composite types within databus (noir-lang/noir#6225)
feat(perf): Follow array sets backwards in array set from get optimization (noir-lang/noir#6208)
fix: check for Schnorr null signature (noir-lang/noir#6226)
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Oct 8, 2024
feat(test): Fuzz test stdlib hash functions (noir-lang/noir#6233)
fix: handle nested arrays in calldata (noir-lang/noir#6232)
feat: add more `Type` and `UnresolvedType` methods (noir-lang/noir#5994)
fix: Panic on composite types within databus (noir-lang/noir#6225)
feat(perf): Follow array sets backwards in array set from get optimization (noir-lang/noir#6208)
fix: check for Schnorr null signature (noir-lang/noir#6226)
AztecBot added a commit to AztecProtocol/aztec-packages that referenced this pull request Oct 8, 2024
feat(test): Fuzz test stdlib hash functions (noir-lang/noir#6233)
fix: handle nested arrays in calldata (noir-lang/noir#6232)
feat: add more `Type` and `UnresolvedType` methods (noir-lang/noir#5994)
fix: Panic on composite types within databus (noir-lang/noir#6225)
feat(perf): Follow array sets backwards in array set from get optimization (noir-lang/noir#6208)
fix: check for Schnorr null signature (noir-lang/noir#6226)
sirasistant added a commit to AztecProtocol/aztec-packages that referenced this pull request Oct 8, 2024
Automated pull of development from the
[noir](https://github.com/noir-lang/noir) programming language, a
dependency of Aztec.
BEGIN_COMMIT_OVERRIDE
feat: handwritten parser (noir-lang/noir#6180)
feat(test): Fuzz test stdlib hash functions
(noir-lang/noir#6233)
fix: handle nested arrays in calldata
(noir-lang/noir#6232)
feat: add more `Type` and `UnresolvedType` methods
(noir-lang/noir#5994)
fix: Panic on composite types within databus
(noir-lang/noir#6225)
feat(perf): Follow array sets backwards in array set from get
optimization (noir-lang/noir#6208)
fix: check for Schnorr null signature
(noir-lang/noir#6226)
END_COMMIT_OVERRIDE

---------

Co-authored-by: sirasistant <[email protected]>
@Aristotelis2002
Copy link

Hello, we have noticed a significant speedup in running the complete test suite from the root of the project (with cargo test -q) after this change. This is great!

We were just wondering: could it be related to this PR? It's so fast that we are actually suspicious whether the tests are running or not 😅 ... but they seem to be.

@asterite
Copy link
Collaborator Author

asterite commented Oct 8, 2024

Hi! Indeed, it's related to this PR. I think cargo test on the root runs some of the tests, for example the ones on the nargo_cli project, and the times I got for this were:

  • before this PR: 2:47 minutes
  • after this PR: 38 seconds

github-merge-queue bot pushed a commit that referenced this pull request Oct 8, 2024
# Description

## Problem

We currently parse `unconstrained pub fn ...` but ideally we'd want it
to be `pub unconstrained fn ...`

See this comment too:
#6180 (comment)

## Summary

This PR implements the above, but actually all of these:
- We now allow `unconstrained pub fn ...` and `pub unconstrained fn`
(the former for backwards compatibility)
- The formatter will change `unconstrained pub fn` to `pub unconstrained
fn` (so after some time we can remove the backwards-compatibility case)
- The formatter will now format a function "header" (I didn't know what
to call it): that's the fn outer comments, attributes, and modifiers.
For example before this PR if you wrote `pub fn foo() {}` it was left
untouched, but now it's changed to `pub fn foo() {}`.

## Additional Context

None.

## Documentation

Check one:
- [x] No documentation needed.
- [ ] Documentation included in this PR.
- [ ] **[For Experimental Features]** Documentation to be submitted in a
separate PR.

# PR Checklist

- [x] I have tested the changes locally.
- [x] I have formatted the changes with [Prettier](https://prettier.io/)
and/or `cargo fmt` on default settings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-external-checks Trigger CI job to run tests on external repos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expression inside lt triggers Stack Overflow Error Misdiagnosed parse error Remove chumsky fork
6 participants