-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISLE: lexer simplifications #9108
Conversation
Instead of creating a temporary `Vec<u8>`, use a slice of the original underlying `buf`, and only allocate a temporary `String` if it contains an `_`. Copyright (c) 2024, Arm Limited. Signed-off-by: Karl Meakin <[email protected]>
`Vec` can be compared against arrays, since both deref to slices. Copyright (c) 2024, Arm Limited. Signed-off-by: Karl Meakin <[email protected]>
Centralize all file related arenas in `Files` struct. Copyright (c) 2024, Arm Limited. Signed-off-by: Karl Meakin <[email protected]>
They are already tracked in `Files`, so no need to track them in `Pos` as well. This lets us simply the implementation of `Lexer::advance_pos` a bit. Copyright (c) 2024, Arm Limited. Signed-off-by: Karl Meakin <[email protected]>
`Files` was being threaded through a lot of passes where it wasn't needed. It is only needed for reporting errors in `compile.rs` and for reporting line numbers when printing in `codegen.rs`. Copyright (c) 2024, Arm Limited. Signed-off-by: Karl Meakin <[email protected]>
Store the text being lexed as `&str`, rather than `&[u8]`, so that substrings don't need to be rechecked for UTF-8 validity when lexing identifiers or integers. Copyright (c) 2024, Arm Limited. Signed-off-by: Karl Meakin <[email protected]>
Copyright (c) 2024, Arm Limited. Signed-off-by: Karl Meakin <[email protected]>
Copyright (c) 2024, Arm Limited. Signed-off-by: Karl Meakin <[email protected]>
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Hi, thanks for these contributions! If I can nerd-snipe you into some unrelated lexer work, we have a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic, thanks!
Apologies for the slow review turn around, I've been having ISP issues.
One comment below that I'd like to get your take on, but shouldn't be a blocker for landing this. Can address it in a follow up, if necessary.
fn peek_byte(&self) -> Option<u8> { | ||
self.src.as_bytes().get(self.pos.offset).copied() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not fn peek_char(&self) -> Option<char>
? I guess a byte is mildly more performant, in theory if LLVM doesn't clean things up, but that doesn't seem super compelling compared to the simplification of "we are dealing with strings, so we should also deal with chars".
Instead of trying to parse an integer as an `i128`, and then as an `u128` if that fails, parse it only as a `u128` and then check for `i128::MIN`. Copyright (c) 2024, Arm Limited. Signed-off-by: Karl Meakin <[email protected]>
Vec<u8>
, use a slice of the original buffer, and only reallocate if there are underscores that need to be removedPos
, so don't need to update line/column inadvance_pos()
&str
rather than&[u8]
to avoid checking substrings for UTF-8 validity