Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many is_ascii_ methods are slower than they have to be #65127

Closed
wooster0 opened this issue Oct 5, 2019 · 7 comments · Fixed by #67585
Closed

Many is_ascii_ methods are slower than they have to be #65127

wooster0 opened this issue Oct 5, 2019 · 7 comments · Fixed by #67585
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@wooster0
Copy link
Contributor

wooster0 commented Oct 5, 2019

I noticed this is_ascii check:

self.is_ascii() && (*self as u8).is_ascii_uppercase()
and wondered why it's there.
Then someone told me it's there because of the as u8. If you pass a non-ASCII char, the as u8 would corrupt that char. That's why the is_ascii check is there before it. But then I wondered: why isn't this just cast to a bigger data type that includes all Unicode code points? This method (and others) are slower than they have to be.
Also see this discussion in Discord with more valuable information: https://discordapp.com/channels/442252698964721669/443150878111694848/629982972052897792.

@Dylan-DPC-zz Dylan-DPC-zz transferred this issue from rust-lang/rfcs Oct 5, 2019
@SimonSapin
Copy link
Contributor

(I can’t seem to find the relevant part in the middle of this discord stream…)

I’m not sure what you mean by “corrupt”. some_char as u8 is well defined: it takes the numerical value of the code point (same as some_char as u32), then truncates to keep the lower 8 bits. 'A' as u8 equals 0x41, but so does '🁁' as u8. We want to reject the latter since U+1F041 🁁 is not an ASCII upper-case letter. If char::is_ascii_uppercase is implemented in terms of u8::is_ascii_uppercase, the ASCII range check is necessary. (It could be a u8 range check to, for example with u8::try_from(u32).)

But maybe char::is_ascii_uppercase doesn’t need to be implemented in terms of u8::is_ascii_uppercase. Maybe it could duplicate the logic instead.

Once upon a time u8::is_ascii_uppercase was implemented with a lookup table of 256 entries. A flat lookup table for all of Unicode would be way too big. But these days it’s based on matching a u8 range pattern. Doing the same with a char range pattern could also work.

That said, would it actually be faster? Maybe the optimizer already does its job well. If you want to work on this, consider adding some benchmark to show that code duplication is worth it.

@wooster0
Copy link
Contributor Author

wooster0 commented Oct 5, 2019

Oh it seems the Discord link is a bit buggy. Try searching for Why is there this self.is_ascii() check? in the search bar on the Discord server. Then you should get to my message with the discussion below it. One guy also posted this comparison of the instructions generated: https://godbolt.org/z/RBADFT.

@BurntSushi
Copy link
Member

I think the minimal standard here should be a benchmark before we change the implementation. Can you provide that?

@wooster0
Copy link
Contributor Author

wooster0 commented Oct 5, 2019

I'm not that experienced with Rust yet, sorry. I just wanted to point out this possible performance regression.
Shouldn't the instruction amount difference as shown on Godbolt suffice to know that it's slower?

cmp     edi, 128
jae     .LBB0_1
add     dil, -65
mov     al, 1
cmp     dil, 25
ja      .LBB0_1
ret

vs.

add     edi, -65
cmp     edi, 26
setb    al
ret

@anirudhb
Copy link
Contributor

anirudhb commented Oct 6, 2019

Looking at the assembly, it seems that the problem is the is_ascii check.

These three functions produce identical assembly:

pub fn foo(c: char) -> bool {
    match c as u8 {
        65..=90 => true,
        _ => false
    }
}

pub fn bar(c: char) -> bool {
    match c {
        'A'..='Z' => true,
        _ => false,
    }
}

pub fn baz(c: char) -> bool {
    (c as u8).is_ascii_uppercase()
}

So, it seems that the is_ascii check is unnecessary.

I could make a PR for this.

@anirudhb
Copy link
Contributor

anirudhb commented Oct 6, 2019

Here are some benchmarks:

#![feature(test)]

extern crate test;
use test::Bencher;

#[inline(always)]
fn custom_ascii_uppercase_without_ascii_check(c: char) -> bool {
    match c {
        'A'..='Z' => true,
        _ => false,
    }
}

#[inline(always)]
fn custom_ascii_uppercase_with_ascii_check_builtin_is_ascii(c: char) -> bool {
    c.is_ascii() && match c {
        'A'..='Z' => true,
        _ => false,
    }
}

#[inline(always)]
fn custom_ascii_uppercase_with_ascii_check_custom_is_ascii_convert_to_u8(c: char) -> bool {
    c as u8 <= 0x7fu8 && match c {
        'A'..='Z' => true,
        _ => false,
    }
}

#[inline(always)]
fn custom_ascii_uppercase_with_ascii_check_custom_is_ascii_convert_to_u32(c: char) -> bool {
    c as u32 <= 0x7fu32 && match c {
        'A'..='Z' => true,
        _ => false,
    }
}

#[bench]
fn bench_custom_ascii_uppercase_without_ascii_check(b: &mut Bencher) {
    let x = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    b.iter(|| {
        for c in x.chars() {
            test::black_box(custom_ascii_uppercase_without_ascii_check(c));
        }
    });
}

#[bench]
fn bench_custom_ascii_uppercase_with_ascii_check_builtin_is_ascii(b: &mut Bencher) {
    let x = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    b.iter(|| {
        for c in x.chars() {
            test::black_box(custom_ascii_uppercase_with_ascii_check_builtin_is_ascii(c));
        }
    });
}

#[bench]
fn bench_custom_ascii_uppercase_with_ascii_check_custom_is_ascii_convert_to_u8(b: &mut Bencher) {
    let x = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    b.iter(|| {
        for c in x.chars() {
            test::black_box(custom_ascii_uppercase_with_ascii_check_custom_is_ascii_convert_to_u8(c));
        }
    });
}

#[bench]
fn bench_custom_ascii_uppercase_with_ascii_check_custom_is_ascii_convert_to_u32(b: &mut Bencher) {
    let x = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    b.iter(|| {
        for c in x.chars() {
            test::black_box(custom_ascii_uppercase_with_ascii_check_custom_is_ascii_convert_to_u32(c));
        }
    });
}

#[bench]
fn bench_builtin_ascii_uppercase(b: &mut Bencher) {
    let x = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    b.iter(|| {
        for c in x.chars() {
            test::black_box(c.is_ascii_uppercase());
        }
    });
}
$ cargo bench
    Finished release [optimized] target(s) in 0.00s
     Running target/release/deps/_14178-f02baafefc742e50

running 5 tests
test bench_builtin_ascii_uppercase                                                ... bench:          79 ns/iter (+/- 11)
test bench_custom_ascii_uppercase_with_ascii_check_builtin_is_ascii               ... bench:          76 ns/iter (+/- 13)
test bench_custom_ascii_uppercase_with_ascii_check_custom_is_ascii_convert_to_u32 ... bench:          78 ns/iter (+/- 8)
test bench_custom_ascii_uppercase_with_ascii_check_custom_is_ascii_convert_to_u8  ... bench:          80 ns/iter (+/- 24)
test bench_custom_ascii_uppercase_without_ascii_check                             ... bench:          58 ns/iter (+/- 9)

test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out

So it seems that removing the is_ascii check has a measurable performance difference.

@SimonSapin
Copy link
Contributor

custom_ascii_uppercase_with_ascii_check_custom_is_ascii_convert_to_u8 is incorrect. The conversion should happen not for the ASCII check but for the match, and the pattern use byte literals like b'A'. (Though I expect this won’t affect results much.)

@Alexendoo Alexendoo added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Oct 9, 2019
@bors bors closed this as completed in 79ebf53 Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants