Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libsyntax: Simplify libsyntax lexer #66214

Closed
wants to merge 3 commits into from

Conversation

popzxc
Copy link
Contributor

@popzxc popzxc commented Nov 8, 2019

tl;dr

This PR introduces 2 new modules in libsyntax lexer: literal_validation and verification. Those modules encapsulate most of the tokens validity checks and error reporting.

This PR is completely cosmetic and does not introduce any new approaches in the lexer logic.

Motivation:

Most of checks performed in the lexer are trivial and mostly consist of checking some condition and reporting an error. Thus, IMHO it makes sense to extract them in the separate module, so the logic of the main module will become more declarative and parsing will not me mixed up with error reporting.

r? @petrochenkov
cc @matklad

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 8, 2019
@popzxc popzxc changed the title Simplify libsyntax lexer libsyntax: Simplify libsyntax lexer Nov 8, 2019
@petrochenkov
Copy link
Contributor

What's the difference between "verification" and "validation" here?
In both these files we both 1) produce and 2) report errors.
There's also lexer/unescape_error_reporting.rs doing very similar things.

I don't immediately see why this is a simplification and wouldn't want to make changes to this (already recently rewritten) code without having some clear goal.

If some code takes an already prepared error type (like LexerError or something) and presents it to the user (via struct_span_err and similar) then such code can generally be placed into a file diagnostics.rs (there are several already existing examples of such files).
(That's what unescape_error_reporting.rs currently does.)

If some code simultaneously produces and reports a (usually non-fatal) error and skips the error type, then it can be kept in some validation.rs file (like ast_validation.rs in rustc_passes). In this case it would be token_validation.rs.
I think it would be clearer to merge verification.rs and literal_validation.rs into one such file.

@petrochenkov
Copy link
Contributor

Could you also remove the rustfmt changes?
People are currently working on automating them and enforcing formatting on CI, so we'll eventually have them, but it's hard to review code when such changes are made in the middle of other changes (which often happens if the formatting is done before the review is complete).

@petrochenkov petrochenkov added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 9, 2019
@bors
Copy link
Contributor

bors commented Nov 10, 2019

☔ The latest upstream changes (presumably #65324) made this pull request unmergeable. Please resolve the merge conflicts.

@popzxc
Copy link
Contributor Author

popzxc commented Nov 11, 2019

Regarding the motivation/purpose of the PR: well, as I said above, most of the checks are pretty trivial and the following code:

self.verify_float_exponent_not_empty(start, empty_exponent);
self.verify_float_base(start, suffix_start, base);

gives you exactly the same information as the

if empty_exponent {
    let mut err = self.struct_span_fatal(
        start, self.pos,
        "expected at least one digit in exponent"
    );
    err.emit();
}

match base {
    Base::Hexadecimal => {
        self.err_span_(start, suffix_start,
                       "hexadecimal float literal is not supported")
    }
    Base::Octal => {
        self.err_span_(start, suffix_start,
                       "octal float literal is not supported")
    }
    Base::Binary => {
        self.err_span_(start, suffix_start,
                       "binary float literal is not supported")
    }
    _ => ()
}

Since none of the touched methods are public, IMO it makes sense to encapsulate those checks just for reader to be able to focus on the logic of parsing instead of logic of the error reporting.

Regarding the literal_validation / verification thing: my original idea was to have verification worry only about the token representation (meta-info about the token, like terminated), and the literal_validation to take responsibility of the token contents. But I agree that such a differentiation is too phantom.

Regarding the whole PR: it seems that I won't be able to work on this at least for this whole week, so I close it and open a new one when I'll have the time to implement it properly.

@popzxc popzxc closed this Nov 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants