[codegen] unnecessary panicking branch in `foo().await` (vs equivalent `FutureImpl.await`) #71093

japaric · 2020-04-13T13:26:40Z

I compiled this no_std code with (LTO / -Oz / -C panic=abort) optimizations (full repro instructions at the bottom):

#![no_std]
#![no_main]

#[no_mangle]
fn main() -> ! {
    let mut f = async {
        loop {
            // uncomment only ONE of these statements
            // Foo.await; // NO panicking branch
            foo().await; // HAS panicking branch (though it should be equivalent to `Foo.await`?)
            // bar().await; // NO panicking branch (because it's implicitly divergent?)
            // baz().await; // HAS panicking branch (that it inherit from `foo().await`?)
        }
    };

    let waker = waker();
    let mut cx = Context::from_waker(&waker);
    loop {
        unsafe {
            let _ = Pin::new_unchecked(&mut f).poll(&mut cx);
        }
    }
}

struct Foo;

impl Future for Foo {
    type Output = ();
    fn poll(self: Pin<&mut Self>, _: &mut Context<'_>) -> Poll<()> {
        asm::nop();
        Poll::Ready(())
    }
}

async fn foo() {
    asm::nop();
}

async fn bar() {
    asm::nop();
    loop {}
}

async fn baz() {
    foo().await;
    loop {}
}

I got machine code that includes a panicking branch:

00000400 <main>:
 400:   push    {r5, r6, r7, lr}
 402:   add     r7, sp, #8
 404:   movs    r0, #0
 406:   strh.w  r0, [r7, #-2]
 40a:   subs    r0, r7, #2
 40c:   bl      412 <app::main::{{closure}}>
 410:   udf     #254    ; 0xfe

00000412 <app::main::{{closure}}>:
 412:   push    {r7, lr}
 414:   mov     r7, sp
 416:   mov     r4, r0
 418:   ldrb    r0, [r0, #0]
 41a:   cbz     r0, 426 <app::main::{{closure}}+0x14>
 41c:   ldrb    r0, [r4, #1]
 41e:   cbz     r0, 42a <app::main::{{closure}}+0x18>
 420:   bl      434 <core::panicking::panic>
 424:   udf     #254    ; 0xfe
 426:   movs    r0, #0
 428:   strb    r0, [r4, #1]
 42a:   bl      48e <__nop>
 42e:   movs    r0, #1
 430:   strb    r0, [r4, #1]
 432:   b.n     426 <app::main::{{closure}}+0x14>

00000434 <core::panicking::panic>:
 434:   push    {r7, lr}
 436:   mov     r7, sp
 438:   bl      43e <core::panicking::panic_fmt>
 43c:   udf     #254    ; 0xfe

0000043e <core::panicking::panic_fmt>:
 43e:   push    {r7, lr}
 440:   mov     r7, sp
 442:   bl      48c <rust_begin_unwind>
 446:   udf     #254    ; 0xfe

I expected to see no panicking branches in the output. If I comment out foo().await and uncomment Foo.await (which should be semantically equivalent) then I get the expected output:

00000400 <main>:
 400:   push    {r7, lr}
 402:   mov     r7, sp
 404:   bl      40a <app::main::{{closure}}>
 408:   udf     #254    ; 0xfe

0000040a <app::main::{{closure}}>:
 40a:   push    {r7, lr}
 40c:   mov     r7, sp
 40e:   bl      458 <__nop>
 412:   b.n     40e <app::main::{{closure}}+0x4>

Interestingly, bar().await contains no panicking branch (because it's divergent?), but baz().await does (because it inherits it from foo().await?).

Meta

rustc --version --verbose:

rustc 1.44.0-nightly (94d346360 2020-04-09)

Steps to reproduce

$ git clone https://github.com/rust-embedded/cortex-m-quickstart

$ cd cortex-m-quickstart
$ git reset --hard 1a60c1d94489cec3008166a803bdcf8ac306b98f
$ $EDITOR Cargo.toml && cat Cargo.toml

[package]
edition = "2018"
name = "app"
version = "0.0.0"

[dependencies]
cortex-m = "0.6.0"
cortex-m-rt = "0.6.10"
cortex-m-semihosting = "0.3.3"
panic-halt = "0.2.0"

[profile.dev]
codegen-units = 1
debug = 1
debug-assertions = false
incremental = false
lto = "fat"
opt-level = 'z'
overflow-checks = false

$ $EDITOR src/main.rs && cat src/main.rs

#![no_std]
#![no_main]

use core::{
    future::Future,
    pin::Pin,
    task::{Context, Poll, RawWaker, RawWakerVTable, Waker},
};

use cortex_m_rt::entry;
use cortex_m::asm;
use panic_halt as _;

#[no_mangle]
fn main() -> ! {
    let mut f = async {
        loop {
            // uncomment only ONE of these statements
            // Foo.await; // NO panicking branch
            foo().await; // HAS panicking branch
            // bar().await; // NO panicking branch
            // baz().await; // HAS panicking branch
        }
    };

    let waker = waker();
    let mut cx = Context::from_waker(&waker);
    loop {
        unsafe {
            let _ = Pin::new_unchecked(&mut f).poll(&mut cx);
        }
    }
}

struct Foo;

impl Future for Foo {
    type Output = ();
    fn poll(self: Pin<&mut Self>, _: &mut Context<'_>) -> Poll<()> {
        asm::nop();
        Poll::Ready(())
    }
}

async fn foo() {
    asm::nop();
}

async fn bar() {
    asm::nop();
    loop {}
}

async fn baz() {
    foo().await;
    loop {}
}

fn waker() -> Waker {
    unsafe fn clone(_: *const ()) -> RawWaker {
        RawWaker::new(&(), &VTABLE)
    }
    unsafe fn wake(_: *const ()) {}
    unsafe fn wake_by_ref(_: *const ()) {}
    unsafe fn drop(_: *const ()) {}
    static VTABLE: RawWakerVTable = RawWakerVTable::new(clone, wake, wake_by_ref, drop);

    unsafe { Waker::from_raw(clone(&())) }
}

$ # target = thumbv7m-none-eabi (see .cargo/config)
$ cargo build
$ arm-none-eabi-objdump -Cd target/thumbv7m-none-eabi/debug/app

The text was updated successfully, but these errors were encountered:

jonas-schievink · 2020-04-13T14:15:37Z

This is the "async fn resumed after completion" panic in foo's generator. It's probably not possible to rid of that in general (whether it's actually unreachable depends on how the returned future is polled, and getting this wrong is unsound).

The reason bar does not contain this branch is indeed because it never returns, which this optimization takes advantage of:

rust/src/librustc_mir/transform/generator.rs

Lines 979 to 994 in d28a464

    
           fn can_return<'tcx>(tcx: TyCtxt<'tcx>, body: &Body<'tcx>) -> bool { 
        
               // Returning from a function with an uninhabited return type is undefined behavior. 
        
               if body.return_ty().conservative_is_privately_uninhabited(tcx) { 
        
                   return false; 
        
               } 
        
               // If there's a return terminator the function may return. 
        
               for block in body.basic_blocks() { 
        
                   if let TerminatorKind::Return = block.terminator().kind { 
        
                       return true; 
        
                   } 
        
               } 
        
               // Otherwise the function can't return. 
        
               false 
        
           }

The panic code seems needlessly inefficient though, especially since you're using panic-halt which should just be an infinite loop.

I'd also maybe expect LLVM to do a somewhat better job of seeing through these trivial state machines.

tmandry · 2020-04-21T22:35:03Z

To expand on what @jonas-schievink said, LLVM seems to be too averse to inlining in this case when -Oz.

This makes sense as a general rule, but in this case our closure is only ever getting called from one place. If app::main::{{closure}} were inlined into main, the panic branch would certainly have been optimized out.

Naively speaking, I suspect you could get pretty far with a simple blanket rule in LLVM to always inline when a function is only called from one place. It makes sense to me for -Oz. For other optimization modes, I can see an argument against which is that it could impact code cache performance. (Maybe the impact of this is bad enough to make it a bad idea for -Oz too, I'm not sure.)

More generally, I think LLVM is always going to be averse to inlining generator resume functions with their top-level switchInt branches. It's possible we could experiment with providing inlining hints to encourage this earlier in compilation, but that would likely require a good deal of experimentation.

We could also experiment with "combining" nested state machines ourselves, which would open up the door to new types of optimizations that LLVM can't do, but I'm getting a little ahead of myself :)

japaric added the C-bug Category: This is a bug. label Apr 13, 2020

csmoe added A-async-await Area: Async & Await A-codegen Area: Code generation labels Apr 13, 2020

This comment has been minimized.

Sign in to view

tmandry added the AsyncAwait-Triaged Async-await issues that have been triaged during a working group meeting. label Apr 21, 2020

tmandry added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Apr 21, 2020

tmandry mentioned this issue Apr 21, 2020

Tracking issue for generator code quality #71407

Open

1 task

workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codegen] unnecessary panicking branch in `foo().await` (vs equivalent `FutureImpl.await`) #71093

[codegen] unnecessary panicking branch in `foo().await` (vs equivalent `FutureImpl.await`) #71093

japaric commented Apr 13, 2020

This comment has been minimized.

jonas-schievink commented Apr 13, 2020

tmandry commented Apr 21, 2020 •

edited

Loading

[codegen] unnecessary panicking branch in foo().await (vs equivalent FutureImpl.await) #71093

[codegen] unnecessary panicking branch in foo().await (vs equivalent FutureImpl.await) #71093

Comments

japaric commented Apr 13, 2020

Meta

This comment has been minimized.

jonas-schievink commented Apr 13, 2020

tmandry commented Apr 21, 2020 • edited Loading

[codegen] unnecessary panicking branch in `foo().await` (vs equivalent `FutureImpl.await`) #71093

[codegen] unnecessary panicking branch in `foo().await` (vs equivalent `FutureImpl.await`) #71093

tmandry commented Apr 21, 2020 •

edited

Loading