Noticeable schedules running overhead #11932

chengts95 · 2024-02-18T01:32:23Z

Bevy version 0.13

Original: 0.12.1 release from Cargo.
Current: 0.13.0 release from Cargo.

Relevant system information

Please include:

The Rust version you're using: 1.76
the operating system or browser used, including its version
- Windows 11, Ubuntu 22.04

What's performing poorly?

The schedule is running slower than before, which slowed down my program by 20%. This can impact the performance of real-time simulation applications and may significantly lower the performance when you scale the systems with more schedule plans.

It can be reproduced by running empty schedules:

use std::time::Instant;

use bevy_ecs::schedule;
use bevy_app::prelude::*;

fn main() {
    let mut app = bevy_app::App::default();
    app.add_schedule(schedule::Schedule::new(First));
    app.add_schedule(schedule::Schedule::new(PreUpdate));
    app.add_schedule(schedule::Schedule::new(Update));
    app.add_schedule(schedule::Schedule::new(PostUpdate));
    app.add_schedule(schedule::Schedule::new(Last));
    let now = Instant::now();
    for _i in 0..20_000_000 {
        app.update();
    }
    println!("{}", now.elapsed().as_micros() as f64 / 1e6);
  
}

Performance table for this code:
With LTO = yes

Test Case	Time	Slow Down
bevy_ecs 0.12.1	13.694943	0%
bevy_ecs 0.13, default-features = false	14.759885	7.778%
bevy_ecs 0.13	15.566676	13.7%

With LTO turned-off

Test Case	Time	Slow Down
bevy_ecs 0.12.1	15.359637	0%
bevy_ecs 0.13, default-features = false	16.61063	8%
bevy_ecs 0.13	17.642636	14.9%

The slowing down is obvious, and I don't use the profiler because such overhead is not caused by user systems.

jdm · 2024-02-18T01:35:55Z

Might be related to #11931.

alice-i-cecile · 2024-02-18T01:38:19Z

Note that on Discord we looked into this a bit more, and still got a regression vs 0.12 with system stepping off. It worsened the problem, but wasn't fully the cause.

hymm · 2024-02-18T01:48:06Z

I was able to bisect it to #8453, but not sure what could be causing it yet. Everything obvious seems to be cfg'ed out.

hymm · 2024-02-18T03:04:40Z

I tried cfg'ing out the changes to the run function signature and it seemed to get rid of most of the extra overhead. Not sure why that helped and we probably don't want to be changing function signatures, so not goinig to pr it. main...hymm:bevy:investigate-empty-system-overhead

dmlary · 2024-02-18T03:16:15Z

I dug into the Stepping side of this with the help of @chengts95: https://discord.com/channels/691052431525675048/1208572214442594374

Empty Schedule Benchmark for `benches`

I took his example and converted it to a criterion benchmark:

pub fn run_empty_schedules_five(criterion: &mut Criterion) {
    let mut app = bevy_app::App::default();
    app.add_schedule(Schedule::new(bevy_app::First));
    app.add_schedule(Schedule::new(bevy_app::PreUpdate));
    app.add_schedule(Schedule::new(bevy_app::Update));
    app.add_schedule(Schedule::new(bevy_app::PostUpdate));
    app.add_schedule(Schedule::new(bevy_app::Last));

    let mut group = criterion.benchmark_group("run_empty_schedules");
    group.warm_up_time(std::time::Duration::from_secs(1));
    group.measurement_time(std::time::Duration::from_secs(15));
    group.bench_function("five", |bencher| {
        bencher.iter(|| app.update());
    });
    group.finish();
}

I quickly discovered that MacOS cannot be trusted for benchmarking. I was seeing 8% variance between individual runs using the same bevy version, even with the warmup extended to 5 seconds. Worse was when I disabled stepping, I didn't see an improvement.

I confirmed that @chengts95 ran his benchmark multiple times and saw consistent results for v0.13.0.

Here's a patch against bevy/benches to run the benchmarks I've been unable to reproduce with. If anyone not on a macbook feels up to it, I'd be curious if they also get variable results from cargo bench --bench ecs -- empty_sched.
benches-empty_schedules.diff

Stepping Performance Details

There's three places stepping is hooked into the hot path:

the begin_frame() system called before run_main()

bevy/crates/bevy_app/src/main_schedule.rs

Lines 260 to 264 in abb8c35

    
           #[cfg(feature = "bevy_debug_stepping")] 
        
           { 
        
               use bevy_ecs::schedule::{IntoSystemConfigs, Stepping}; 
        
               app.add_systems(Main, Stepping::begin_frame.before(Main::run_main)); 
        
           }

in Schedule::run()

bevy/crates/bevy_ecs/src/schedule/schedule.rs

Lines 338 to 342 in abb8c35

    
           #[cfg(feature = "bevy_debug_stepping")] 
        
           let skip_systems = match world.get_resource_mut::<Stepping>() { 
        
               None => None, 
        
               Some(mut stepping) => stepping.skipped_systems(self), 
        
           };

in the executors:

bevy/crates/bevy_ecs/src/schedule/executor/multi_threaded.rs

Line 192 in abb8c35

if let Some(mut skipped_systems) = _skip_systems {
- check if _skip_systems is None

Based on the structure of the benchmark (five empty schedules), the changes to Schedule::run() (the get_resource_mut()) is being hit 5x more than begin_frame(), but honestly that reflects real-world use.

Stepping Performance Improvement Idea

Even without concrete numbers, there are a couple of ways we can remove any stepping performance impacts:

The biggest performance win will be to move the skip list from Stepping into the Schedule. Then have Stepping apply changes to the step list during begin_frame(). This eliminates the get_resource_mut() for every schedule every frame. Side effect of this is that Stepping loses dynamic schedule order detection, which honestly was a bad idea on my part. Instead we can just use the explicit order the schedules were added to Stepping, and provide mechanisms to change the order.

A second performance improvement is to remove the begin_frame() system unless the Stepping resource is present. Right now we check for the resource every frame, but we could eliminate that if we could add/remove systems at runtime. As that's not currently possible, we can add a condition to the begin_frame() that checks some global variable in schedule::stepping to see if it should run.

As an alternative to both of these, we can move the needed stepping run state out of the resource and into a global variable. It's effectively the combination of the two options above. The Stepping resource either goes away or becomes an interface to that global state. I'm not really a fan of this as it feels very dirty, but it should be mentioned.

dmlary · 2024-02-18T03:18:48Z

I tried cfg'ing out the changes to the run function signature and it seemed to get rid of most of the extra overhead.

That's crazy, just the removal of the _skip_systems: Option<FixedBitSet> for the executor run() cleared up the rest of the perf difference?

dmlary · 2024-02-18T03:23:47Z

Oh that's very weird. Wait, could this be that Option<FixedBitSet> is allocating enough space for FixedBitSet on the stack even when the value is None? I don't know rust calling convention at that level.

hymm · 2024-02-18T03:29:02Z

Oh that's very weird. Wait, could this be that Option is allocating enough space for FixedBitSet on the stack even when the value is None? I don't know rust calling convention at that level.

Oh yeah, if we're hitting the allocator there, that might be it.

dmlary · 2024-02-18T03:43:13Z

Oh that's very weird. Wait, could this be that Option is allocating enough space for FixedBitSet on the stack even when the value is None? I don't know rust calling convention at that level.

Oh yeah, if we're hitting the allocator there, that might be it.

I jumped the gun, that's actually a call to Vec::len().

~~Confirmed, we are calling the allocator:~~

# verifying the address is `MultithreadedExecutor::run()`
(lldb) image lookup -a 0x00000001032b4d40
      Address: breakout[0x00000001032b4d40] (breakout.__TEXT.__text + 53158848)
      Summary: breakout`_$LT$bevy_ecs..schedule..executor..multi_threaded..MultiThreadedExecutor$u20$as$u20$bevy_ecs..schedule..executor..SystemExecutor$GT$::run::hdf2d2d0ce9eb657b at multi_threaded.rs:166
      
# disassemble `MultithreadedExecutor::run()`:
(lldb) x/40i 0x00000001032b4d40
0x1032b4d40: 55                             other       pushq  %rbp
0x1032b4d41: 48 89 e5                       other       movq   %rsp, %rbp
0x1032b4d44: 48 81 ec b0 04 00 00           other       subq   $0x4b0, %rsp              ; imm = 0x4B0
0x1032b4d4b: 48 89 b5 88 fc ff ff           other       movq   %rsi, -0x378(%rbp)
0x1032b4d52: 48 89 f8                       other       movq   %rdi, %rax
0x1032b4d55: 48 8b bd 88 fc ff ff           other       movq   -0x378(%rbp), %rdi
0x1032b4d5c: 48 89 85 90 fc ff ff           other       movq   %rax, -0x370(%rbp)
0x1032b4d63: 48 89 bd 98 fc ff ff           other       movq   %rdi, -0x368(%rbp)
0x1032b4d6a: 48 89 95 a0 fc ff ff           other       movq   %rdx, -0x360(%rbp)
0x1032b4d71: 48 89 8d a8 fc ff ff           other       movq   %rcx, -0x358(%rbp)
0x1032b4d78: 48 89 45 88                    other       movq   %rax, -0x78(%rbp)
0x1032b4d7c: 48 89 7d 90                    other       movq   %rdi, -0x70(%rbp)
0x1032b4d80: 48 89 4d 98                    other       movq   %rcx, -0x68(%rbp)
0x1032b4d84: c6 45 87 00                    other       movb   $0x0, -0x79(%rbp)
0x1032b4d88: c6 45 86 00                    other       movb   $0x0, -0x7a(%rbp)
0x1032b4d8c: c6 45 85 00                    other       movb   $0x0, -0x7b(%rbp)
0x1032b4d90: c6 45 84 00                    other       movb   $0x0, -0x7c(%rbp)
0x1032b4d94: c6 45 83 00                    other       movb   $0x0, -0x7d(%rbp)
0x1032b4d98: c6 45 84 01                    other       movb   $0x1, -0x7c(%rbp)
0x1032b4d9c: 48 83 c7 18                    other       addq   $0x18, %rdi
0x1032b4da0: e8 cb 7d fe ff                 call        callq  0x10329cb70
0x1032b4da5: 48 89 85 b0 fc ff ff           other       movq   %rax, -0x350(%rbp)
0x1032b4dac: eb 3c                          jump        jmp    0x1032b4dea
0x1032b4dae: 48 8b 95 a0 fc ff ff           other       movq   -0x360(%rbp), %rdx
0x1032b4db5: 48 be 00 00 00 00 00 00 00 80  other       movabsq $-0x8000000000000000, %rsi ; imm = 0x8000000000000000
0x1032b4dbf: b8 01 00 00 00                 other       movl   $0x1, %eax
0x1032b4dc4: 31 c9                          other       xorl   %ecx, %ecx
0x1032b4dc6: 48 39 32                       other       cmpq   %rsi, (%rdx)
0x1032b4dc9: 48 0f 44 c1                    other       cmoveq %rcx, %rax
0x1032b4dcd: 48 83 f8 01                    other       cmpq   $0x1, %rax
0x1032b4dd1: 0f 84 43 0c 00 00              cond jump   je     0x1032b5a1a
0x1032b4dd7: e9 44 0c 00 00                 jump        jmp    0x1032b5a20
0x1032b4ddc: 48 89 c1                       other       movq   %rax, %rcx
0x1032b4ddf: 89 d0                          other       movl   %edx, %eax
0x1032b4de1: 48 89 4d a0                    other       movq   %rcx, -0x60(%rbp)
0x1032b4de5: 89 45 a8                       other       movl   %eax, -0x58(%rbp)
0x1032b4de8: eb c4                          jump        jmp    0x1032b4dae
0x1032b4dea: 48 8b 85 90 fc ff ff           other       movq   -0x370(%rbp), %rax
0x1032b4df1: 48 8b 8d b0 fc ff ff           other       movq   -0x350(%rbp), %rcx
0x1032b4df8: 48 89 88 c8 01 00 00           other       movq   %rcx, 0x1c8(%rax)

# Check the first call, looks like a Vec alloc -- NOPE, is Vec::len()
(lldb) image lookup -a 0x10329cb70
      Address: breakout[0x000000010329cb70] (breakout.__TEXT.__text + 53060080)
      Summary: breakout`alloc::vec::Vec$LT$T$C$A$GT$::len::hed1b7d1b42e2eadd at mod.rs:2141

dmlary · 2024-02-18T03:45:53Z

~~As I don't trust my perf numbers, @hymm would you mind trying a branch with Option<&FixedBitSet> instead of the cfg gate?~~

dmlary · 2024-02-18T03:56:01Z

I read further into the mangled function name; it's not an alloc; it's a Vec::len() call.

dmlary · 2024-02-18T04:14:52Z

As I don't trust my perf numbers, @hymm would you mind trying a branch with Option<&FixedBitSet> instead of the cfg gate?

I retract my request. I'm making this code change and it's getting complicated quickly. I'll keep poking at it trying to figure out why the argument is causing problems.

chengts95 · 2024-02-18T05:46:24Z

I have changed the benchmark to

fn main() {
    let mut app = bevy_app::App::default();
    let mut a1 = schedule::Schedule::new(First);
    a1.set_executor_kind(schedule::ExecutorKind::SingleThreaded);
 

    let now = Instant::now();
    for _i in 0..200_000_000 {
        a1.run(&mut app.world);

    }
    println!("{}", now.elapsed().as_micros() as f64 / 1e6);
  
}

The reason to directly run the schedule on the world is that in an app, every schedule in a hashmap must be removed and added back to the schedule's hashmap, which is very slow and calls memcpy frequently. So it cannot measure the time accurately for this regression.
The code takes 1.71s in 0.12 and 2.34s in 0.13 even with the cargo feature disabled. This indicates a 40% overhead increase in the scheduled run.

This is the test run from Intel VTune Profiler:
Bevy 0.12.0

Bevy 0.13 with stepping disabled

With the Intel profiler's results, the 0.13 CPU time is 40% more than 0.12.

dmlary · 2024-02-18T07:07:52Z

Alright, I broke down the disassembly of Schedule::run(), SingleThreadedExecutor::run(), and MultiThreadedExecutor::run() with stepping enabled, another version with v0.13 with stepping disabled, and a third version with @hymm 's changes and stepping disabled.

v0.13.0 stepping enabled
v0.13.0 stepping disabled
cfg gated args w/ stepping diabled; hymm's branch

We're not explicitly allocing from what I can see here with stepping disabled, but what I do see is calls to core::ptr::drop_in_place() for FixedBitSet or Option<FixedBitSet>.

# calls to `drop_in_place()` for FixedBitSet in v0.13 with stepping enabled
dmlary@beast:~/src/bevy/vendor/bevy/benches> grep callq ../disass-run-stepping.txt | grep drop_in_place | grep fixedbitset
    0x100445970 <+208>:  callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x10044598c <+236>:  callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x100445b23 <+643>:  callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x1004460fb <+2139>: callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x10043d400 <+208>:  callq  0x1004f4c50               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::h2cecbb3fcde869ed at mod.rs:507
    0x10043d8a9 <+1401>: callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x10043dfc4 <+3220>: callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    0x10043e020 <+3312>: callq  0x1005c3c50               ; core::ptr::drop_in_place$LT$fixedbitset..FixedBitSet$GT$::h6dcfbf4b71ae901f at mod.rs:507
    
# Calls for `drop_in_place()` for `Option<FixedBitSet>` in v0.13 with stepping disabled
dmlary@beast:~/src/bevy/vendor/bevy/benches> grep callq ../disass-run-without-stepping-013.txt | grep drop_in_place | grep fixedbitset
    0x100484d85 <+101>:  callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x100484eb5 <+405>:  callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x100489ea9 <+105>:  callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x100489ee7 <+167>:  callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x10048a773 <+2355>: callq  0x100434aa0               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    
# calls to `drop_in_place()` for Option<FixedBitSet>  with @hymm's feature gate on the arg
dmlary@beast:~/src/bevy/vendor/bevy/benches> grep callq ../disass-run-without-stepping.txt | grep drop_in_place | grep fixedbitset
    0x1004d3b7a <+186>: callq  0x100434e20               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507
    0x1004d3b93 <+211>: callq  0x100434e20               ; core::ptr::drop_in_place$LT$core..option..Option$LT$fixedbitset..FixedBitSet$GT$$GT$::hc51e23594530a709 at mod.rs:507

Just looking at the differences between @hymm 's version and v0.13, both with stepping disabled, my best guess is that the cost of dropping Option<FixedBitSet> is higher when we drop it within SystemExecutor::run(). I'm not able to pinpoint exactly why though.

For v0.13 with stepping disabled, we do allocate an additional 80 bytes on the stack versus hymm's with stepping disabled, but I wasn't able to find us copying those 80 bytes over when skip_systems is None.

I'm unable to explain this behavior for now. I'll take another look tomorrow.

dmlary · 2024-02-18T18:12:37Z

Just to rule it out, I tried &Option<FixedBitSet> as an argument; it didn't resolve the issue.

Tryed `&Option<FixedBitSet>` for SystemExecutor::run(), but it didn't change the performance hit of the argument. related bevyengine#11932

related bevyengine#11932

# Objective * Fixes #11932 (performance impact when stepping is disabled) ## Solution The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in #8453 caused a measurable performance impact even when stepping is disabled. This can be seen by the benchmark of running `Schedule:run()` on an empty schedule in a tight loop (#11932 (comment)). I was able to get the same performance results as on 0.12.1 by changing the argument `ScheduleExecutor::run()` from `Option<FixedBitSet>` to `Option<&FixedBitSet>`. The down-side of this change is that `Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when stepping is enabled --- ## Changelog * Change `ScheduleExecutor::run()` `_skipped_systems` from `Option<FixedBitSet>` to `Option<&FixedBitSet>` * Added a few benchmarks to measure `Schedule::run()` performance with various executors

# Objective * Fixes bevyengine#11932 (performance impact when stepping is disabled) ## Solution The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in bevyengine#8453 caused a measurable performance impact even when stepping is disabled. This can be seen by the benchmark of running `Schedule:run()` on an empty schedule in a tight loop (bevyengine#11932 (comment)). I was able to get the same performance results as on 0.12.1 by changing the argument `ScheduleExecutor::run()` from `Option<FixedBitSet>` to `Option<&FixedBitSet>`. The down-side of this change is that `Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when stepping is enabled --- ## Changelog * Change `ScheduleExecutor::run()` `_skipped_systems` from `Option<FixedBitSet>` to `Option<&FixedBitSet>` * Added a few benchmarks to measure `Schedule::run()` performance with various executors

# Objective * Fixes #11932 (performance impact when stepping is disabled) ## Solution The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in #8453 caused a measurable performance impact even when stepping is disabled. This can be seen by the benchmark of running `Schedule:run()` on an empty schedule in a tight loop (#11932 (comment)). I was able to get the same performance results as on 0.12.1 by changing the argument `ScheduleExecutor::run()` from `Option<FixedBitSet>` to `Option<&FixedBitSet>`. The down-side of this change is that `Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when stepping is enabled --- ## Changelog * Change `ScheduleExecutor::run()` `_skipped_systems` from `Option<FixedBitSet>` to `Option<&FixedBitSet>` * Added a few benchmarks to measure `Schedule::run()` performance with various executors

# Objective * Fixes bevyengine#11932 (performance impact when stepping is disabled) ## Solution The `Option<FixedBitSet>` argument added to `ScheduleExecutor::run()` in bevyengine#8453 caused a measurable performance impact even when stepping is disabled. This can be seen by the benchmark of running `Schedule:run()` on an empty schedule in a tight loop (bevyengine#11932 (comment)). I was able to get the same performance results as on 0.12.1 by changing the argument `ScheduleExecutor::run()` from `Option<FixedBitSet>` to `Option<&FixedBitSet>`. The down-side of this change is that `Schedule::run()` now takes about 6% longer (3.7319 ms vs 3.9855ns) when stepping is enabled --- ## Changelog * Change `ScheduleExecutor::run()` `_skipped_systems` from `Option<FixedBitSet>` to `Option<&FixedBitSet>` * Added a few benchmarks to measure `Schedule::run()` performance with various executors

chengts95 added C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times P-Regression Functionality that used to work but no longer does. Add a test for this! S-Needs-Triage This issue needs to be labelled labels Feb 18, 2024

alice-i-cecile added S-Needs-Investigation This issue requires detective work to figure out what's going wrong and removed S-Needs-Triage This issue needs to be labelled labels Feb 18, 2024

dmlary added a commit to dmlary/bevy that referenced this issue Feb 18, 2024

stepping disabled performance test

dd717c3

Tryed `&Option<FixedBitSet>` for SystemExecutor::run(), but it didn't change the performance hit of the argument. related bevyengine#11932

dmlary added a commit to dmlary/bevy that referenced this issue Feb 18, 2024

Trying Option<&FixedBitSet>

7f3f11c

related bevyengine#11932

dmlary mentioned this issue Feb 18, 2024

Stepping disabled performance fix #11959

Merged

alice-i-cecile closed this as completed in #11959 Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noticeable schedules running overhead #11932

Noticeable schedules running overhead #11932

chengts95 commented Feb 18, 2024

jdm commented Feb 18, 2024

alice-i-cecile commented Feb 18, 2024

hymm commented Feb 18, 2024

hymm commented Feb 18, 2024 •

edited

Loading

dmlary commented Feb 18, 2024

dmlary commented Feb 18, 2024

dmlary commented Feb 18, 2024

hymm commented Feb 18, 2024

dmlary commented Feb 18, 2024 •

edited

Loading

dmlary commented Feb 18, 2024 •

edited

Loading

dmlary commented Feb 18, 2024

dmlary commented Feb 18, 2024

chengts95 commented Feb 18, 2024 •

edited

Loading

dmlary commented Feb 18, 2024 •

edited

Loading

dmlary commented Feb 18, 2024

Noticeable schedules running overhead #11932

Noticeable schedules running overhead #11932

Comments

chengts95 commented Feb 18, 2024

Bevy version 0.13

Relevant system information

What's performing poorly?

jdm commented Feb 18, 2024

alice-i-cecile commented Feb 18, 2024

hymm commented Feb 18, 2024

hymm commented Feb 18, 2024 • edited Loading

dmlary commented Feb 18, 2024

Empty Schedule Benchmark for benches

Stepping Performance Details

Stepping Performance Improvement Idea

dmlary commented Feb 18, 2024

dmlary commented Feb 18, 2024

hymm commented Feb 18, 2024

dmlary commented Feb 18, 2024 • edited Loading

dmlary commented Feb 18, 2024 • edited Loading

dmlary commented Feb 18, 2024

dmlary commented Feb 18, 2024

chengts95 commented Feb 18, 2024 • edited Loading

dmlary commented Feb 18, 2024 • edited Loading

dmlary commented Feb 18, 2024

hymm commented Feb 18, 2024 •

edited

Loading

Empty Schedule Benchmark for `benches`

dmlary commented Feb 18, 2024 •

edited

Loading

dmlary commented Feb 18, 2024 •

edited

Loading

chengts95 commented Feb 18, 2024 •

edited

Loading

dmlary commented Feb 18, 2024 •

edited

Loading