Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Tokens newtype wrapper, TokenKind iterator #11361

Merged
merged 2 commits into from
May 14, 2024
Merged

Add Tokens newtype wrapper, TokenKind iterator #11361

merged 2 commits into from
May 14, 2024

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented May 10, 2024

Summary

Alternative to #11237

This PR adds a new Tokens struct which is a newtype wrapper around a vector of lexer output. This allows us to add a kinds method which returns an iterator over the corresponding TokenKind. This iterator is implemented as a separate TokenKindIter struct to allow using the type and provide additional methods like peek directly on the iterator.

This exposes the linter to access the stream of TokenKind instead of Tok.

Edit: I've made the necessary downstream changes and plan to merge the entire stack at once.

@dhruvmanila dhruvmanila added the parser Related to the parser label May 10, 2024
Copy link
Contributor

github-actions bot commented May 10, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

lnbits/lnbits (error)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

Failed to pull: From https://github.com/lnbits/lnbits
 + 365f9a39...e9e69d9d main       -> origin/main  (forced update)
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint: 
hint:   git config pull.rebase false  # merge
hint:   git config pull.rebase true   # rebase
hint:   git config pull.ff only       # fast-forward only
hint: 
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Need to specify how to reconcile divergent branches.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@dhruvmanila dhruvmanila changed the base branch from main to dhruv/bench-linter May 14, 2024 03:12
@dhruvmanila dhruvmanila changed the title Add Tokens newtype wrapper Add Tokens newtype wrapper, TokenKind iterator May 14, 2024
@dhruvmanila dhruvmanila added the internal An internal refactor or improvement label May 14, 2024
@dhruvmanila dhruvmanila marked this pull request as ready for review May 14, 2024 03:13
@dhruvmanila
Copy link
Member Author

dhruvmanila commented May 14, 2024

I might possibly need to maintain a Vec<TokenKind> because certain references utilizes lex_starts_at which returns the tokens within a certain range. An iterator would make it a O(n) while a vector would allow us to do a binary search. I could possibly use a BTreeMap of token start to the (TokenKind, TextRange) to directly use the range method.

Edit: I'm using binary search to get the start and end index.

/// Then, the range `4..10` returns an iterator which yields `Name`, `Lpar`, `Rpar`, and
/// `Colon` token. But, if the given position doesn't match any of the tokens, an empty
/// iterator is returned.
pub fn kinds_within_range<T: Ranged>(&self, ranged: T) -> TokenKindIter {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this isn't currently being used anywhere. This would basically replace the usages of lex_starts_at but it turns out all of the usages of the function is being done in the AST checker where we don't have access to the token stream.

One solution here would be to store a TokenKinds struct which contains Vec<(TokenKind, TextRange)> on the Checker. This way the rules which utilizes the lex_starts_at or lex function can still get the tokens.

Another would be to club this change with the parser. I'm leaning more towards this.

Base automatically changed from dhruv/bench-linter to main May 14, 2024 14:28
@dhruvmanila dhruvmanila enabled auto-merge (squash) May 14, 2024 16:38
@dhruvmanila dhruvmanila merged commit 025768d into main May 14, 2024
19 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/tokens branch May 14, 2024 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement parser Related to the parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants