Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noticeable schedules running overhead #11932

Closed
chengts95 opened this issue Feb 18, 2024 · 15 comments · Fixed by #11959
Closed

Noticeable schedules running overhead #11932

chengts95 opened this issue Feb 18, 2024 · 15 comments · Fixed by #11959
Labels
C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-Regression Functionality that used to work but no longer does. Add a test for this! S-Needs-Investigation This issue requires detective work to figure out what's going wrong

Comments

@chengts95
Copy link

Bevy version 0.13

  • Original: 0.12.1 release from Cargo.
  • Current: 0.13.0 release from Cargo.

Relevant system information

Please include:

  • The Rust version you're using: 1.76
  • the operating system or browser used, including its version
    • Windows 11, Ubuntu 22.04

What's performing poorly?

The schedule is running slower than before, which slowed down my program by 20%. This can impact the performance of real-time simulation applications and may significantly lower the performance when you scale the systems with more schedule plans.

It can be reproduced by running empty schedules:

use std::time::Instant;

use bevy_ecs::schedule;
use bevy_app::prelude::*;

fn main() {
    let mut app = bevy_app::App::default();
    app.add_schedule(schedule::Schedule::new(First));
    app.add_schedule(schedule::Schedule::new(PreUpdate));
    app.add_schedule(schedule::Schedule::new(Update));
    app.add_schedule(schedule::Schedule::new(PostUpdate));
    app.add_schedule(schedule::Schedule::new(Last));
    let now = Instant::now();
    for _i in 0..20_000_000 {
        app.update();
    }
    println!("{}", now.elapsed().as_micros() as f64 / 1e6);
  
}

Performance table for this code:
With LTO = yes

Test Case Time Slow Down
bevy_ecs 0.12.1 13.694943 0%
bevy_ecs 0.13, default-features = false 14.759885 7.778%
bevy_ecs 0.13 15.566676 13.7%

With LTO turned-off

Test Case Time Slow Down
bevy_ecs 0.12.1 15.359637 0%
bevy_ecs 0.13, default-features = false 16.61063 8%
bevy_ecs 0.13 17.642636 14.9%

The slowing down is obvious, and I don't use the profiler because such overhead is not caused by user systems.

@chengts95 chengts95 added C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-Regression Functionality that used to work but no longer does. Add a test for this! S-Needs-Triage This issue needs to be labelled labels Feb 18, 2024
@jdm
Copy link
Contributor

jdm commented Feb 18, 2024

Might be related to #11931.

@alice-i-cecile alice-i-cecile added S-Needs-Investigation This issue requires detective work to figure out what's going wrong and removed S-Needs-Triage This issue needs to be labelled labels Feb 18, 2024
@alice-i-cecile
Copy link
Member

Note that on Discord we looked into this a bit more, and still got a regression vs 0.12 with system stepping off. It worsened the problem, but wasn't fully the cause.

@hymm
Copy link
Contributor

hymm commented Feb 18, 2024

I was able to bisect it to #8453, but not sure what could be causing it yet. Everything obvious seems to be cfg'ed out.

@hymm
Copy link
Contributor

hymm commented Feb 18, 2024

I tried cfg'ing out the changes to the run function signature and it seemed to get rid of most of the extra overhead. Not sure why that helped and we probably don't want to be changing function signatures, so not goinig to pr it. main...hymm:bevy:investigate-empty-system-overhead

@dmlary
Copy link
Contributor

dmlary commented Feb 18, 2024

I dug into the Stepping side of this with the help of @chengts95: https://discord.com/channels/691052431525675048/1208572214442594374

Empty Schedule Benchmark for benches

I took his example and converted it to a criterion benchmark:

pub fn run_empty_schedules_five(criterion: &mut Criterion) {
    let mut app = bevy_app::App::default();
    app.add_schedule(Schedule::new(bevy_app::First));
    app.add_schedule(Schedule::new(bevy_app::PreUpdate));
    app.add_schedule(Schedule::new(bevy_app::Update));
    app.add_schedule(Schedule::new(bevy_app::PostUpdate));
    app.add_schedule(Schedule::new(bevy_app::Last));

    let mut group = criterion.benchmark_group("run_empty_schedules");
    group.warm_up_time(std::time::Duration::from_secs(1));
    group.measurement_time(std::time::Duration::from_secs(15));
    group.bench_function("five", |bencher| {
        bencher.iter(|| app.update());
    });
    group.finish();
}

I quickly discovered that MacOS cannot be trusted for benchmarking. I was seeing 8% variance between individual runs using the same bevy version, even with the warmup extended to 5 seconds. Worse was when I disabled stepping, I didn't see an improvement.

I confirmed that @chengts95 ran his benchmark multiple times and saw consistent results for v0.13.0.

Here's a patch against bevy/benches to run the benchmarks I've been unable to reproduce with. If anyone not on a macbook feels up to it, I'd be curious if they also get variable results from cargo bench --bench ecs -- empty_sched.
benches-empty_schedules.diff

Stepping Performance Details

There's three places stepping is hooked into the hot path:

Based on the structure of the benchmark (five empty schedules), the changes to Schedule::run() (the get_resource_mut()) is being hit 5x more than begin_frame(), but honestly that reflects real-world use.

Stepping Performance Improvement Idea

Even without concrete numbers, there are a couple of ways we can remove any stepping performance impacts:

The biggest performance win will be to move the skip list from Stepping into the Schedule. Then have Stepping apply changes to the step list during begin_frame(). This eliminates the get_resource_mut() for every schedule every frame. Side effect of this is that Stepping loses dynamic schedule order detection, which honestly was a bad idea on my part. Instead we can just use the explicit order the schedules were added to Stepping, and provide mechanisms to change the order.

A second performance improvement is to remove the begin_frame() system unless the Stepping resource is present. Right now we check for the resource every frame, but we could eliminate that if we could add/remove systems at runtime. As that's not currently possible, we can add a condition to the begin_frame() that checks some global variable in schedule::stepping to see if it should run.

As an alternative to both of these, we can move the needed stepping run state out of the resource and into a global variable. It's effectively the combination of the two options above. The Stepping resource either goes away or becomes an interface to that global state. I'm not really a fan of this as it feels very dirty, but it should be mentioned.

@dmlary
Copy link
Contributor

dmlary commented Feb 18, 2024

I tried cfg'ing out the changes to the run function signature and it seemed to get rid of most of the extra overhead.

That's crazy, just the removal of the _skip_systems: Option<FixedBitSet> for the executor run() cleared up the rest of the perf difference?

@dmlary
Copy link
Contributor

dmlary commented Feb 18, 2024

Oh that's very weird. Wait, could this be that Option<FixedBitSet> is allocating enough space for FixedBitSet on the stack even when the value is None? I don't know rust calling convention at that level.

@hymm
Copy link
Contributor

hymm commented Feb 18, 2024

Oh that's very weird. Wait, could this be that Option is allocating enough space for FixedBitSet on the stack even when the value is None? I don't know rust calling convention at that level.

Oh yeah, if we're hitting the allocator there, that might be it.

@dmlary
Copy link
Contributor

dmlary commented Feb 18, 2024

Oh that's very weird. Wait, could this be that Option is allocating enough space for FixedBitSet on the stack even when the value is None? I don't know rust calling convention at that level.

Oh yeah, if we're hitting the allocator there, that might be it.

I jumped the gun, that's actually a call to Vec::len().

Confirmed, we are calling the allocator:

# verifying the address is `MultithreadedExecutor::run()`
(lldb) image lookup -a 0x00000001032b4d40
      Address: breakout[0x00000001032b4d40] (breakout.__TEXT.__text + 53158848)
      Summary: breakout`_$LT$bevy_ecs..schedule..executor..multi_threaded..MultiThreadedExecutor$u20$as$u20$bevy_ecs..schedule..executor..SystemExecutor$GT$::run::hdf2d2d0ce9eb657b at multi_threaded.rs:166
      
# disassemble `MultithreadedExecutor::run()`:
(lldb) x/40i 0x00000001032b4d40
0x1032b4d40: 55                             other       pushq  %rbp
0x1032b4d41: 48 89 e5                       other       movq   %rsp, %rbp
0x1032b4d44: 48 81 ec b0 04 00 00           other       subq   $0x4b0, %rsp              ; imm = 0x4B0
0x1032b4d4b: 48 89 b5 88 fc ff ff           other       movq   %rsi, -0x378(%rbp)
0x1032b4d52: 48 89 f8                       other       movq   %rdi, %rax
0x1032b4d55: 48 8b bd 88 fc ff ff           other       movq   -0x378(%rbp), %rdi
0x1032b4d5c: 48 89 85 90 fc ff ff           other       movq   %rax, -0x370(%rbp)
0x1032b4d63: 48 89 bd 98 fc ff ff           other       movq   %rdi, -0x368(%rbp)
0x1032b4d6a: 48 89 95 a0 fc ff ff           other       movq   %rdx, -0x360(%rbp)
0x1032b4d71: 48 89 8d a8 fc ff ff           other       movq   %rcx, -0x358(%rbp)
0x1032b4d78: 48 89 45 88                    other       movq   %rax, -0x78(%rbp)
0x1032b4d7c: 48 89 7d 90                    other       movq   %rdi, -0x70(%rbp)
0x1032b4d80: 48 89 4d 98                    other       movq   %rcx, -0x68(%rbp)
0x1032b4d84: c6 45 87 00                    other       movb   $0x0, -0x79(%rbp)
0x1032b4d88: c6 45 86 00                    other       movb   $0x0, -0x7a(%rbp)
0x1032b4d8c: c6 45 85 00                    other       movb   $0x0, -0x7b(%rbp)
0x1032b4d90: c6 45 84 00                    other       movb   $0x0, -0x7c(%rbp)
0x1032b4d94: c6 45 83 00                    other       movb   $0x0, -0x7d(%rbp)
0x1032b4d98: c6 45 84 01                    other       movb   $0x1, -0x7c(%rbp)
0x1032b4d9c: 48 83 c7 18                    other       addq   $0x18, %rdi
0x1032b4da0: e8 cb 7d fe ff                 call        callq  0x10329cb70
0x1032b4da5: 48 89 85 b0 fc ff ff           other       movq   %rax, -0x350(%rbp)
0x1032b4dac: eb 3c                          jump        jmp    0x1032b4dea
0x1032b4dae: 48 8b 95 a0 fc ff ff           other       movq   -0x360(%rbp), %rdx
0x1032b4db5: 48 be 00 00 00 00 00 00 00 80  other       movabsq $-0x8000000000000000, %rsi ; imm = 0x8000000000000000
0x1032b4dbf: b8 01 00 00 00                 other       movl   $0x1, %eax
0x1032b4dc4: 31 c9                          other       xorl   %ecx, %ecx
0x1032b4dc6: 48 39 32                       other       cmpq   %rsi, (%rdx)
0x1032b4dc9: 48 0f 44 c1                    other       cmoveq %rcx, %rax
0x1032b4dcd: 48 83 f8 01                    other       cmpq   $0x1, %rax
0x1032b4dd1: 0f 84 43 0c 00 00              cond jump   je     0x1032b5a1a
0x1032b4dd7: e9 44 0c 00 00                 jump        jmp    0x1032b5a20
0x1032b4ddc: 48 89 c1                       other       movq   %rax, %rcx
0x1032b4ddf: 89 d0                          other       movl   %edx, %eax
0x1032b4de1: 48 89 4d a0                    other       movq   %rcx, -0x60(%rbp)
0x1032b4de5: 89 45 a8                       other       movl   %eax, -0x58(%rbp)
0x1032b4de8: eb c4                          jump        jmp    0x1032b4dae
0x1032b4dea: 48 8b 85 90 fc ff ff           other       movq   -0x370(%rbp), %rax
0x1032b4df1: 48 8b 8d b0 fc ff ff           other       movq   -0x350(%rbp), %rcx
0x1032b4df8: 48 89 88 c8 01 00 00           other       movq   %rcx, 0x1c8(%rax)

# Check the first call, looks like a Vec alloc -- NOPE, is Vec::len()
(lldb) image lookup -a 0x10329cb70
      Address: breakout[0x000000010329cb70] (breakout.__TEXT.__text + 53060080)
      Summary: breakout`alloc::vec::Vec$LT$T$C$A$GT$::len::hed1b7d1b42e2eadd at mod.rs:2141

@dmlary
Copy link
Contributor

dmlary commented Feb 18, 2024

As I don't trust my perf numbers, @hymm would you mind trying a branch with Option<&FixedBitSet> instead of the cfg gate?

@dmlary
Copy link
Contributor

dmlary commented Feb 18, 2024

I read further into the mangled function name; it's not an alloc; it's a Vec::len() call.

@dmlary
Copy link
Contributor

dmlary commented Feb 18, 2024

As I don't trust my perf numbers, @hymm would you mind trying a branch with Option<&FixedBitSet> instead of the cfg gate?

I retract my request. I'm making this code change and it's getting complicated quickly. I'll keep poking at it trying to figure out why the argument is causing problems.

@chengts95
Copy link
Author

chengts95 commented Feb 18, 2024

I have changed the benchmark to

fn main() {
    let mut app = bevy_app::App::default();
    let mut a1 = schedule::Schedule::new(First);
    a1.set_executor_kind(schedule::ExecutorKind::SingleThreaded);
 

    let now = Instant::now();
    for _i in 0..200_000_000 {
        a1.run(&mut app.world);

    }
    println!("{}", now.elapsed().as_micros() as f64 / 1e6);
  
} 

The reason to directly run the schedule on the world is that in an app, every schedule in a hashmap must be removed and added back to the schedule's hashmap, which is very slow and calls memcpy frequently. So it cannot measure the time accurately for this regression.
The code takes 1.71s in 0.12 and 2.34s in 0.13 even with the cargo feature disabled. This indicates a 40% overhead increase in the scheduled run.

This is the test run from Intel VTune Profiler:
Bevy 0.12.0
image
Bevy 0.13 with stepping disabled
image

With the Intel profiler's results, the 0.13 CPU time is 40% more than 0.12.

@dmlary
Copy link
Contributor

dmlary commented Feb 18, 2024

Alright, I broke down the disassembly of Schedule::run(), SingleThreadedExecutor::run(), and MultiThreadedExecutor::run() with stepping enabled, another version with v0.13 with stepping disabled, and a third version with @hymm 's changes and stepping disabled.

v0.13.0 stepping enabled
v0.13.0 stepping disabled
cfg gated args w/ stepping diabled; hymm's branch

We're not explicitly allocing from what I can see here with stepping disabled, but what I do see is calls to core::ptr::drop_in_place() for FixedBitSet or Option<FixedBitSet>.

# calls to `drop_in_place()` for FixedBitSet in v0.13 with stepping enabled
dmlary@beast:~/src/bevy/vendor/bevy/benches> grep callq ../disass-run-stepping.txt | grep drop_in_place | grep fixedbitset
    0x100445970 <+208>:  callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x10044598c <+236>:  callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x100445b23 <+643>:  callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x1004460fb <+2139>: callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x10043d400 <+208>:  callq  0x1004f4c50               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::h2cecbb3fcde869ed at mod.rs:507
    0x10043d8a9 <+1401>: callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x10043dfc4 <+3220>: callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x10043e020 <+3312>: callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    
# Calls for `drop_in_place()` for `Option<FixedBitSet>` in v0.13 with stepping disabled
dmlary@beast:~/src/bevy/vendor/bevy/benches> grep callq ../disass-run-without-stepping-013.txt | grep drop_in_place | grep fixedbitset
    0x100484d85 <+101>:  callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x100484eb5 <+405>:  callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x100489ea9 <+105>:  callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x100489ee7 <+167>:  callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x10048a773 <+2355>: callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    
# calls to `drop_in_place()` for Option<FixedBitSet>  with @hymm's feature gate on the arg
dmlary@beast:~/src/bevy/vendor/bevy/benches> grep callq ../disass-run-without-stepping.txt | grep drop_in_place | grep fixedbitset
    0x1004d3b7a <+186>: callq  0x100434e20               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x1004d3b93 <+211>: callq  0x100434e20               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507

Just looking at the differences between @hymm 's version and v0.13, both with stepping disabled, my best guess is that the cost of dropping Option<FixedBitSet> is higher when we drop it within SystemExecutor::run(). I'm not able to pinpoint exactly why though.

For v0.13 with stepping disabled, we do allocate an additional 80 bytes on the stack versus hymm's with stepping disabled, but I wasn't able to find us copying those 80 bytes over when skip_systems is None.

I'm unable to explain this behavior for now. I'll take another look tomorrow.

@dmlary
Copy link
Contributor

dmlary commented Feb 18, 2024

Just to rule it out, I tried &Option<FixedBitSet> as an argument; it didn't resolve the issue.

dmlary added a commit to dmlary/bevy that referenced this issue Feb 18, 2024
Tryed `&Option<FixedBitSet>` for SystemExecutor::run(), but it didn't
change the performance hit of the argument.

related bevyengine#11932
dmlary added a commit to dmlary/bevy that referenced this issue Feb 18, 2024
github-merge-queue bot pushed a commit that referenced this issue Feb 19, 2024
# Objective

* Fixes #11932 (performance impact when stepping is disabled)

## Solution

The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in
#8453 caused a measurable performance impact even when stepping is
disabled. This can be seen by the benchmark of running `Schedule:run()`
on an empty schedule in a tight loop
(#11932 (comment)).

I was able to get the same performance results as on 0.12.1 by changing
the argument
`ScheduleExecutor::run()` from `Option<FixedBitSet>` to
`Option<&FixedBitSet>`. The down-side of this change is that
`Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when
stepping is enabled

---

## Changelog
* Change `ScheduleExecutor::run()` `_skipped_systems` from
`Option<FixedBitSet>` to `Option<&FixedBitSet>`
* Added a few benchmarks to measure `Schedule::run()` performance with
various executors
msvbg pushed a commit to msvbg/bevy that referenced this issue Feb 26, 2024
# Objective

* Fixes bevyengine#11932 (performance impact when stepping is disabled)

## Solution

The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in
bevyengine#8453 caused a measurable performance impact even when stepping is
disabled. This can be seen by the benchmark of running `Schedule:run()`
on an empty schedule in a tight loop
(bevyengine#11932 (comment)).

I was able to get the same performance results as on 0.12.1 by changing
the argument
`ScheduleExecutor::run()` from `Option<FixedBitSet>` to
`Option<&FixedBitSet>`. The down-side of this change is that
`Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when
stepping is enabled

---

## Changelog
* Change `ScheduleExecutor::run()` `_skipped_systems` from
`Option<FixedBitSet>` to `Option<&FixedBitSet>`
* Added a few benchmarks to measure `Schedule::run()` performance with
various executors
msvbg pushed a commit to msvbg/bevy that referenced this issue Feb 26, 2024
# Objective

* Fixes bevyengine#11932 (performance impact when stepping is disabled)

## Solution

The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in
bevyengine#8453 caused a measurable performance impact even when stepping is
disabled. This can be seen by the benchmark of running `Schedule:run()`
on an empty schedule in a tight loop
(bevyengine#11932 (comment)).

I was able to get the same performance results as on 0.12.1 by changing
the argument
`ScheduleExecutor::run()` from `Option<FixedBitSet>` to
`Option<&FixedBitSet>`. The down-side of this change is that
`Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when
stepping is enabled

---

## Changelog
* Change `ScheduleExecutor::run()` `_skipped_systems` from
`Option<FixedBitSet>` to `Option<&FixedBitSet>`
* Added a few benchmarks to measure `Schedule::run()` performance with
various executors
mockersf pushed a commit that referenced this issue Feb 27, 2024
# Objective

* Fixes #11932 (performance impact when stepping is disabled)

## Solution

The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in
#8453 caused a measurable performance impact even when stepping is
disabled. This can be seen by the benchmark of running `Schedule:run()`
on an empty schedule in a tight loop
(#11932 (comment)).

I was able to get the same performance results as on 0.12.1 by changing
the argument
`ScheduleExecutor::run()` from `Option<FixedBitSet>` to
`Option<&FixedBitSet>`. The down-side of this change is that
`Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when
stepping is enabled

---

## Changelog
* Change `ScheduleExecutor::run()` `_skipped_systems` from
`Option<FixedBitSet>` to `Option<&FixedBitSet>`
* Added a few benchmarks to measure `Schedule::run()` performance with
various executors
NiseVoid pushed a commit to NiseVoid/bevy that referenced this issue Jul 8, 2024
# Objective

* Fixes bevyengine#11932 (performance impact when stepping is disabled)

## Solution

The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in
bevyengine#8453 caused a measurable performance impact even when stepping is
disabled. This can be seen by the benchmark of running `Schedule:run()`
on an empty schedule in a tight loop
(bevyengine#11932 (comment)).

I was able to get the same performance results as on 0.12.1 by changing
the argument
`ScheduleExecutor::run()` from `Option<FixedBitSet>` to
`Option<&FixedBitSet>`. The down-side of this change is that
`Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when
stepping is enabled

---

## Changelog
* Change `ScheduleExecutor::run()` `_skipped_systems` from
`Option<FixedBitSet>` to `Option<&FixedBitSet>`
* Added a few benchmarks to measure `Schedule::run()` performance with
various executors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-Regression Functionality that used to work but no longer does. Add a test for this! S-Needs-Investigation This issue requires detective work to figure out what's going wrong
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants