Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: add likely and unlikely intrinsics #36181

Merged
merged 1 commit into from
Sep 13, 2016
Merged

Conversation

seanmonstar
Copy link
Contributor

@seanmonstar seanmonstar commented Aug 31, 2016

I'm no good at reading assembly, but I have tried a stage1 compiler with this patch, and it does cause different asm output. Additionally, testing this compiler on my httparse crate with some likely usage added in to the branches does affect benchmarks. However, I'm sure a codegen test should be included, if anyone knows what it should look like.

There isn't an entry in librustc_trans/context.rs in this diff, because it already exists (llvm.expect.i1 is used for array indices).


Even though this does affect httparse benchmarks, it doesn't seem to affect it the same way GCC's __builtin_expect affects picohttpparser. I was confused that the deviation on the benchmarks grew hugely when testing this, especially since I'm absolutely certain that the branchs where I added likely were always true. I chalk that up to GCC and LLVM handle branch prediction differently.

cc #26179

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@@ -194,6 +194,14 @@ extern "rust-intrinsic" {
/// own, or if it does not enable any significant optimizations.
pub fn assume(b: bool);

#[cfg(not(stage0))]
/// dox
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the docs need filling in.

@Aatch
Copy link
Contributor

Aatch commented Sep 1, 2016

The main thing to check is that this:

pub fn test(a: u32, b: u32) -> Option<u32> {
    if likely(a == b) {
        None
    } else {
        Some(a + b)
    }
}

Produces something similar to this (without optimisations on):

define i64 @test(i32, i32) unnamed_addr #0 {
entry-block:
  %sret_slot = alloca %"2.std::option::Option<u32>"
  %return = alloca %"2.std::option::Option<u32>"
  br label %start

start:                                            ; preds = %entry-block
  %2 = icmp eq i32 %0, %1
  %3 = call zeroext i1 @llvm.expect.i1(i1 zeroext %2, true)
  br label %bb1

bb1:                                              ; preds = %start
  br i1 %3, label %bb2, label %bb3

bb2:                                              ; preds = %bb1
  %4 = getelementptr inbounds %"2.std::option::Option<u32>", %"2.std::option::Option<u32>"* %return, i32 0, i32 0
  store i32 0, i32* %4
  br label %bb5

bb3:                                              ; preds = %bb1
  %5 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %0, i32 %1)
  %6 = extractvalue { i32, i1 } %5, 0
  %7 = extractvalue { i32, i1 } %5, 1
  %8 = call i1 @llvm.expect.i1(i1 %7, i1 false)
  br i1 %8, label %panic, label %bb4

bb4:                                              ; preds = %bb3
  %9 = getelementptr inbounds %"2.std::option::Option<u32>", %"2.std::option::Option<u32>"* %return, i32 0, i32 0
  store i32 1, i32* %9
  %10 = bitcast %"2.std::option::Option<u32>"* %return to { i32, i32 }*
  %11 = getelementptr inbounds { i32, i32 }, { i32, i32 }* %10, i32 0, i32 1
  store i32 %6, i32* %11
  br label %bb5

bb5:                                              ; preds = %bb2, %bb4
  %12 = bitcast %"2.std::option::Option<u32>"* %return to i64*
  %13 = load i64, i64* %12, align 4
  ret i64 %13

panic:                                            ; preds = %bb3
  call void @_ZN4core9panicking5panic17h44f94ad2f4e3e170E({ %str_slice, %str_slice, i32 }* @panic_loc7544)
  unreachable
}

The point of interest being this:

  %2 = icmp eq i32 %0, %1
  %3 = call zeroext i1 @llvm.expect.i1(i1 zeroext %2, true)
  br label %bb1

As far as I know, this pattern is pretty much what LLVM is looking for when lowering the expect intrinsic, and won't detect much else. The main thing (and what we didn't have before) is the immediate use of the result of the intrinsic in the branch condition.

Note that the IR here was generated with a close-to-master build of the compiler, and tweaked slightly by hand so I'm confident that this patch is producing the correct IR.

@Aatch
Copy link
Contributor

Aatch commented Sep 1, 2016

For the documentation, it should definitely be noted that any use other than if likely(cond) { ... } will likely not have an effect.

@nagisa
Copy link
Member

nagisa commented Sep 1, 2016

Previous PR: #26429

@nagisa
Copy link
Member

nagisa commented Sep 1, 2016

Closes #26179

Should we be closing tracking issues before we expose the stable interfaces to the intrinsics?

@nagisa
Copy link
Member

nagisa commented Sep 1, 2016

I can’t seem to coerce clang > 3.6 into generating this intrinsic OR branch weight metadata. I have a feeling that this, combined with your observed increase in variance (I observed a similar one a while ago), may be a good indicator that we ought not to implement any such intrinsic at all, or at least find a number of cases which show a very clear benefit.

@nagisa
Copy link
Member

nagisa commented Sep 1, 2016

#llvm told me expect intrinsic is more harmful than it is helpful and that the branch weight metadata should be emitted instead. Might be worth to go back and reconsider the RFC, since the branch weight metadata is more powerful and expressive and would be better expressed as attributes in source too.

why is clang > 3.6 not emiting expect intrinsic for __builtin_expect?
nagisa: it turns out that the expect intrinsic overall harmed the optimizer by inhibiting certain optimizations
so we now directly lower __builtin_expect to branch probabilities in the frontend when generating IR

@seanmonstar
Copy link
Contributor Author

@nagisa I had similar feelings. I am also left wondering if the current llvm.expect.i1 usage internally (bounds checks, overflow checks, ...) might actually likewise cause slowness instead of optimizations.

@seanmonstar
Copy link
Contributor Author

Pushed documentation and a codegen test.

@Aatch
Copy link
Contributor

Aatch commented Sep 2, 2016

@nagisa that was something I thought about. I think we should merge this PR for now, then look at improving the implementation. The advantage of using a function (i.e. intrinsic) is that they just automatically handle stuff like short-circuiting, it's also not clear where the best place for attributes would be in an if-else(-if) expression.

@nagisa
Copy link
Member

nagisa commented Sep 2, 2016

I think we should merge this PR for now, then look at improving the implementation.

My concern here that there might be a nicer way to take advantage of branch metadata and expose more power. For example this compiles (with some features enabled):

match Ok::<_, ()>(0) {
    Ok(0) => { 
        #![most_likely] 
        ::std::process::exit(0) 
    },
    Ok(_) => { 
        #![not_that_likely] 
        ::std::process::exit(1) 
    },
    Err(_) => { #![unlikely] ::std::process::exit(42) }
}

I feel like something similar could be made possible by allowing attributes for branch bodies in if-else.

I also think likely/unlikely aren’t very easy to work with and would be quite hard to implement them to lower to branch metadata.

I am also left wondering if the current llvm.expect.i1 usage internally (bounds checks, overflow checks, ...) might actually likewise cause slowness instead of optimizations.

Its certainly a possibility, especially for non-x86 targets.

@nikomatsakis
Copy link
Contributor

Hmm. I don't really know what to do here. =)

@rust-lang/compiler -- thoughts? The summary is that @seanmonstar did awesome work implementing likely/unlikely intrinsics, but now there is some second guessing as to whether this is the right approach. Should we land this work? Back off and redesign a bit more?

Right now I'm leaning towards: land the PR, carry on this discussion in the tracking issue -- but this kind of feedback is definitely a negative strike for stabilization.

@nikomatsakis
Copy link
Contributor

The implementation at least looks good to me. Simple.

#[no_mangle]
pub fn check_likely(x: i32, y: i32) -> Option<i32> {
unsafe {
// CHECK: call i1 @llvm.expect.i1(i1 %{{.*}}, i1 true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check some surroundings around this intrinsic too? I.e. that icmp or something goes before the intrinsic and that branch goes right after the intrinsic?

It would look something along the lines of

CHECK: [[cond:%[0-9]+]] = icmp ...
CHECK-NEXT: call i1 @llvm.expect.i1(i1 [[cond]], i1 true)
CHECK-NEXT: br [[cond]] ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there's a couple of casts and things in between. This is the relevant part of the likely.ll file:

start:
  %2 = icmp eq i32 %0, %1
  %3 = call i1 @llvm.expect.i1(i1 %2, i1 true)
  %4 = zext i1 %3 to i8
  store i8 %4, i8* %tmp_ret
  %5 = load i8, i8* %tmp_ret, !range !0
  %6 = trunc i8 %5 to i1
  br label %bb1

bb1:
  br i1 %6, label %bb2, label %bb3

@nagisa
Copy link
Member

nagisa commented Sep 3, 2016

Okay, I guess I’m fine with landing this in its current state as an unstable intrinsic, but I’ll have my reservations when we get to stabilisation effort.

@seanmonstar
Copy link
Contributor Author

I'm not sure why travis is failing. Looking through the raw log, I don't find an instance of a test failing... It just claims to "exit with 2"...

@durka
Copy link
Contributor

durka commented Sep 3, 2016

@seanmonstar it's just #36138.

@seanmonstar
Copy link
Contributor Author

Good to go?

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Sep 13, 2016

📌 Commit b778f7f has been approved by nikomatsakis

@nikomatsakis
Copy link
Contributor

@nagisa

I’ll have my reservations when we get to stabilisation effort.

agreed

@bors
Copy link
Contributor

bors commented Sep 13, 2016

⌛ Testing commit b778f7f with merge e4ee8e8...

@bors
Copy link
Contributor

bors commented Sep 13, 2016

💔 Test failed - auto-win-gnu-32-opt

@alexcrichton
Copy link
Member

@bors: retry

On Tue, Sep 13, 2016 at 8:24 AM, bors [email protected] wrote:

💔 Test failed - auto-win-gnu-32-opt
https://buildbot.rust-lang.org/builders/auto-win-gnu-32-opt/builds/5621


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#36181 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAD95KGY5i525JGsV1eEQ81_CkzN8MlBks5qpsAZgaJpZM4JyKM4
.

@bors
Copy link
Contributor

bors commented Sep 13, 2016

⌛ Testing commit b778f7f with merge bd254b8...

@alexcrichton
Copy link
Member

@bors: retry force clean

@bors
Copy link
Contributor

bors commented Sep 13, 2016

⌛ Testing commit b778f7f with merge 2fd0608...

bors added a commit that referenced this pull request Sep 13, 2016
core: add likely and unlikely intrinsics

I'm no good at reading assembly, but I have tried a stage1 compiler with this patch, and it does cause different asm output. Additionally, testing this compiler on my httparse crate with some `likely` usage added in to the branches does affect benchmarks. However, I'm sure a codegen test should be included, if anyone knows what it should look like.

There isn't an entry in `librustc_trans/context.rs` in this diff, because it already exists (`llvm.expect.i1` is used for array indices).

----

Even though this does affect httparse benchmarks, it doesn't seem to affect it the same way GCC's `__builtin_expect` affects picohttpparser. I was confused that the deviation on the benchmarks grew hugely when testing this, especially since I'm absolutely certain that the branchs where I added `likely` were always `true`. I chalk that up to GCC and LLVM handle branch prediction differently.

cc #26179
@bors bors merged commit b778f7f into rust-lang:master Sep 13, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants