Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panicks at match arm that starts with non-ascii char #4868

Closed
toslunar opened this issue Jun 13, 2021 · 4 comments · Fixed by #4880
Closed

Panicks at match arm that starts with non-ascii char #4868

toslunar opened this issue Jun 13, 2021 · 4 comments · Fixed by #4880
Assignees
Labels
a-matches match arms, patterns, blocks, etc bug Panic, non-idempotency, invalid code, etc.

Comments

@toslunar
Copy link

enum NonAscii {
    Abcd,
    Éfgh,
}

use NonAscii::*;

fn f(x: NonAscii) -> bool {
    match x {
        Éfgh => true,
        _ => false,
    }
}

fn main() {
    dbg!(f(Abcd));
}

Rustfmt on Playground (1.4.37-nightly (2021-06-12 24bdc6d)) fails to format the above code:

thread 'main' panicked at 'byte index 109 is not a char boundary; it is inside 'É' (bytes 108..110) of `enum NonAscii {
    Abcd,
    Éfgh,
}

use NonAscii::*;

fn f(x: NonAscii) -> bool {
    match x {
        Éfgh => true,
        _ => false,
    }
}

fn main() {
    dbg!(f(Abcd));
}
`', src/tools/rustfmt/src/visitor.rs:47:15
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
@NichtsHsu
Copy link

@crlf0710

@calebcartwright calebcartwright added a-matches match arms, patterns, blocks, etc bug Panic, non-idempotency, invalid code, etc. labels Jun 21, 2021
@calebcartwright
Copy link
Member

Haven't had a chance to check but assume that there's no issues with the pattern span, and that the most likely source is where we're trying to figure out if the pattern has a leading pipe and the corresponding byte position:

rustfmt/src/matches.rs

Lines 162 to 174 in 6495024

/// Collect a byte position of the beginning `|` for each arm, if available.
fn collect_beginning_verts(
context: &RewriteContext<'_>,
arms: &[ast::Arm],
) -> Vec<Option<BytePos>> {
arms.iter()
.map(|a| {
context
.snippet_provider
.opt_span_before(mk_sp_lo_plus_one(a.pat.span.lo()), "|")
})
.collect()
}

@crlf0710
Copy link
Member

Basically the mk_sp_lo_plus_one function is incorrect, since utf-8 is a multibyte encoding format.

@calebcartwright
Copy link
Member

Basically the mk_sp_lo_plus_one function is incorrect, since utf-8 is a multibyte encoding format.

I'm not sure about that. That function is correctly generating the expected/desired span, but we should avoid using that span-baed approach here to see if the pattern starts with a pipe given this case. If generating span lo/hi's with bytepos offsets is incorrect then we have much bigger problems 😄

I realize this is likely an annoying issue for folks so will take care of fixing this one myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-matches match arms, patterns, blocks, etc bug Panic, non-idempotency, invalid code, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants