Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[codegen] unnecessary panicking branch in foo().await (vs equivalent FutureImpl.await) #71093

Open
japaric opened this issue Apr 13, 2020 · 3 comments
Labels
A-async-await Area: Async & Await A-codegen Area: Code generation A-coroutines Area: Coroutines A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@japaric
Copy link
Member

japaric commented Apr 13, 2020

I compiled this no_std code with (LTO / -Oz / -C panic=abort) optimizations (full repro instructions at the bottom):

#![no_std]
#![no_main]

#[no_mangle]
fn main() -> ! {
    let mut f = async {
        loop {
            // uncomment only ONE of these statements
            // Foo.await; // NO panicking branch
            foo().await; // HAS panicking branch (though it should be equivalent to `Foo.await`?)
            // bar().await; // NO panicking branch (because it's implicitly divergent?)
            // baz().await; // HAS panicking branch (that it inherit from `foo().await`?)
        }
    };

    let waker = waker();
    let mut cx = Context::from_waker(&waker);
    loop {
        unsafe {
            let _ = Pin::new_unchecked(&mut f).poll(&mut cx);
        }
    }
}

struct Foo;

impl Future for Foo {
    type Output = ();
    fn poll(self: Pin<&mut Self>, _: &mut Context<'_>) -> Poll<()> {
        asm::nop();
        Poll::Ready(())
    }
}

async fn foo() {
    asm::nop();
}

async fn bar() {
    asm::nop();
    loop {}
}

async fn baz() {
    foo().await;
    loop {}
}

I got machine code that includes a panicking branch:

00000400 <main>:
 400:   push    {r5, r6, r7, lr}
 402:   add     r7, sp, #8
 404:   movs    r0, #0
 406:   strh.w  r0, [r7, #-2]
 40a:   subs    r0, r7, #2
 40c:   bl      412 <app::main::{{closure}}>
 410:   udf     #254    ; 0xfe

00000412 <app::main::{{closure}}>:
 412:   push    {r7, lr}
 414:   mov     r7, sp
 416:   mov     r4, r0
 418:   ldrb    r0, [r0, #0]
 41a:   cbz     r0, 426 <app::main::{{closure}}+0x14>
 41c:   ldrb    r0, [r4, #1]
 41e:   cbz     r0, 42a <app::main::{{closure}}+0x18>
 420:   bl      434 <core::panicking::panic>
 424:   udf     #254    ; 0xfe
 426:   movs    r0, #0
 428:   strb    r0, [r4, #1]
 42a:   bl      48e <__nop>
 42e:   movs    r0, #1
 430:   strb    r0, [r4, #1]
 432:   b.n     426 <app::main::{{closure}}+0x14>

00000434 <core::panicking::panic>:
 434:   push    {r7, lr}
 436:   mov     r7, sp
 438:   bl      43e <core::panicking::panic_fmt>
 43c:   udf     #254    ; 0xfe

0000043e <core::panicking::panic_fmt>:
 43e:   push    {r7, lr}
 440:   mov     r7, sp
 442:   bl      48c <rust_begin_unwind>
 446:   udf     #254    ; 0xfe

I expected to see no panicking branches in the output. If I comment out foo().await and uncomment Foo.await (which should be semantically equivalent) then I get the expected output:

00000400 <main>:
 400:   push    {r7, lr}
 402:   mov     r7, sp
 404:   bl      40a <app::main::{{closure}}>
 408:   udf     #254    ; 0xfe

0000040a <app::main::{{closure}}>:
 40a:   push    {r7, lr}
 40c:   mov     r7, sp
 40e:   bl      458 <__nop>
 412:   b.n     40e <app::main::{{closure}}+0x4>

Interestingly, bar().await contains no panicking branch (because it's divergent?), but baz().await does (because it inherits it from foo().await?).

Meta

rustc --version --verbose:

rustc 1.44.0-nightly (94d346360 2020-04-09)
Steps to reproduce

$ git clone https://github.com/rust-embedded/cortex-m-quickstart

$ cd cortex-m-quickstart
$ git reset --hard 1a60c1d94489cec3008166a803bdcf8ac306b98f
$ $EDITOR Cargo.toml && cat Cargo.toml
[package]
edition = "2018"
name = "app"
version = "0.0.0"

[dependencies]
cortex-m = "0.6.0"
cortex-m-rt = "0.6.10"
cortex-m-semihosting = "0.3.3"
panic-halt = "0.2.0"

[profile.dev]
codegen-units = 1
debug = 1
debug-assertions = false
incremental = false
lto = "fat"
opt-level = 'z'
overflow-checks = false
$ $EDITOR src/main.rs && cat src/main.rs
#![no_std]
#![no_main]

use core::{
    future::Future,
    pin::Pin,
    task::{Context, Poll, RawWaker, RawWakerVTable, Waker},
};

use cortex_m_rt::entry;
use cortex_m::asm;
use panic_halt as _;

#[no_mangle]
fn main() -> ! {
    let mut f = async {
        loop {
            // uncomment only ONE of these statements
            // Foo.await; // NO panicking branch
            foo().await; // HAS panicking branch
            // bar().await; // NO panicking branch
            // baz().await; // HAS panicking branch
        }
    };

    let waker = waker();
    let mut cx = Context::from_waker(&waker);
    loop {
        unsafe {
            let _ = Pin::new_unchecked(&mut f).poll(&mut cx);
        }
    }
}

struct Foo;

impl Future for Foo {
    type Output = ();
    fn poll(self: Pin<&mut Self>, _: &mut Context<'_>) -> Poll<()> {
        asm::nop();
        Poll::Ready(())
    }
}

async fn foo() {
    asm::nop();
}

async fn bar() {
    asm::nop();
    loop {}
}

async fn baz() {
    foo().await;
    loop {}
}

fn waker() -> Waker {
    unsafe fn clone(_: *const ()) -> RawWaker {
        RawWaker::new(&(), &VTABLE)
    }
    unsafe fn wake(_: *const ()) {}
    unsafe fn wake_by_ref(_: *const ()) {}
    unsafe fn drop(_: *const ()) {}
    static VTABLE: RawWakerVTable = RawWakerVTable::new(clone, wake, wake_by_ref, drop);

    unsafe { Waker::from_raw(clone(&())) }
}
$ # target = thumbv7m-none-eabi (see .cargo/config)
$ cargo build
$ arm-none-eabi-objdump -Cd target/thumbv7m-none-eabi/debug/app

@japaric japaric added the C-bug Category: This is a bug. label Apr 13, 2020
@csmoe csmoe added A-async-await Area: Async & Await A-codegen Area: Code generation labels Apr 13, 2020
@jonas-schievink jonas-schievink added A-coroutines Area: Coroutines C-enhancement Category: An issue proposing an enhancement or a PR with one. I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed C-bug Category: This is a bug. labels Apr 13, 2020
@jonas-schievink

This comment has been minimized.

@jonas-schievink
Copy link
Contributor

This is the "async fn resumed after completion" panic in foo's generator. It's probably not possible to rid of that in general (whether it's actually unreachable depends on how the returned future is polled, and getting this wrong is unsound).

The reason bar does not contain this branch is indeed because it never returns, which this optimization takes advantage of:

fn can_return<'tcx>(tcx: TyCtxt<'tcx>, body: &Body<'tcx>) -> bool {
// Returning from a function with an uninhabited return type is undefined behavior.
if body.return_ty().conservative_is_privately_uninhabited(tcx) {
return false;
}
// If there's a return terminator the function may return.
for block in body.basic_blocks() {
if let TerminatorKind::Return = block.terminator().kind {
return true;
}
}
// Otherwise the function can't return.
false
}

The panic code seems needlessly inefficient though, especially since you're using panic-halt which should just be an infinite loop.

I'd also maybe expect LLVM to do a somewhat better job of seeing through these trivial state machines.

@tmandry tmandry added the AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. label Apr 21, 2020
@tmandry
Copy link
Member

tmandry commented Apr 21, 2020

To expand on what @jonas-schievink said, LLVM seems to be too averse to inlining in this case when -Oz.

This makes sense as a general rule, but in this case our closure is only ever getting called from one place. If app::main::{{closure}} were inlined into main, the panic branch would certainly have been optimized out.

Naively speaking, I suspect you could get pretty far with a simple blanket rule in LLVM to always inline when a function is only called from one place. It makes sense to me for -Oz. For other optimization modes, I can see an argument against which is that it could impact code cache performance. (Maybe the impact of this is bad enough to make it a bad idea for -Oz too, I'm not sure.)


More generally, I think LLVM is always going to be averse to inlining generator resume functions with their top-level switchInt branches. It's possible we could experiment with providing inlining hints to encourage this earlier in compilation, but that would likely require a good deal of experimentation.

We could also experiment with "combining" nested state machines ourselves, which would open up the door to new types of optimizations that LLVM can't do, but I'm getting a little ahead of myself :)

@tmandry tmandry added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Apr 21, 2020
@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-async-await Area: Async & Await A-codegen Area: Code generation A-coroutines Area: Coroutines A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants