Catching panics is eating into Servo's DOM performance #34727

jdm · 2016-07-08T21:24:51Z

I have a profile showing that of 60% of execution time spent under a Rust callback from SpiderMonkey that corresponds to invoking a method on a JS object inside of a hot loop, fully half of that time is spent inside __rust_try, __rust_maybe_catch_panic, and std::panicking::PANIC_COUNT::__getit. Since we go Rust -> C++ -> JS -> C++ -> Rust when executing JS that invokes a DOM method, every loop iteration we call catch_unwind in the Rust callbacks invoked by C. This appears to be impacting our potential maximum performance significantly.

Issues to solve

Use only one personality function - Use only one personality function instead of 2 on Unix #34786
Inline cross-crate thread-local access - Access to thread locals isn't inlined across crates #25088
Don't hit TLS when entering catch_unwind - Don't hit thread-local-storage when entering catch_unwind #34787

The text was updated successfully, but these errors were encountered:

eefriedman · 2016-07-08T23:13:52Z

I assume you mean that std::panic::catch_unwind is slow, even if the called code doesn't panic? That's bad. Probably code which should be inlined isn't getting inlined for some reason.

CC @alexcrichton

jdm · 2016-07-09T02:12:06Z

Yes, that's correct.

alexcrichton · 2016-07-09T02:42:34Z

Unfortunately there's not a lot we can do here to easily make this better. The catch_unwind function was not prioritized to be fast as it was assumed it's only crossed at isolation boundaries, which should be relatively rare in theory.

That's not to say it can never be fast, just that it's going to be difficult. The reasons you're seeing this today are:

Thread locals across crates are not easily inlined
With the design of panic runtimes, the call to __rust_maybe_catch_panic is unconditional and will never go away unless you LTO. This prevents some inlining.
The __rust_try function is generated by the compiler specially because it has a different personality function than all the rest of the Rust code in the world. As a result, LLVM cannot inline this because there are differing personality functions.

There are a few local optimizations possible, such as not using options here, but I've tried that out locally and it doesn't really buy you much.

LTO will buy you the most in terms of optimizations because thread local access can get inlined as well as __rust_maybe_catch_panic. At that point all that can't be inlined is __rust_try. We could try to finagle only using one personality routine so this could get inlined, but it may not be trivial.

That's at least my current thoughts for now, unfortunately that may not help much, but hopefully it's at least a little illuminating!

alexcrichton · 2016-07-09T03:06:03Z

Some ideas for how to get rid of TLS:

We don't actually need to hit TLS when we run catch_unwind, the only thing it's doing is resetting the panic counter to 0. This is if you have something like this:

use std::panic::catch_unwind;
struct A;

let _a = A;
panic!();

impl Drop for A {
    fn drop(&mut self) {
        catch_unwind(|| panic!());
    }
}

That program doesn't abort. Now whether or not that program aborts with a double panic seems a bit iffy to me. This means that a catch_unwind changes thread::panicking to false which is arguably the wrong thing to do because the thread is still panicking. I'd be fine saying that something like this is akin to double panic, but others may feel differently.

Basically, what I'm getting at is that we could try to remove the hit to TLS in catch_unwind if possible. All we need to do then is to be able to detect when panicking if we're already panicking. This may be possible with some sort of stack unwinding (maybe throw a "pseudo-exception" to see if something catches it?) or something like that. Haven't thought this through all the way, but it should be fine for us to make panics basically as arbitrarily slow as we can.

Next, to enable all the inlining we need to get rid of our two personality functions and only have one. This should be possible as C++ does it all the time. The implementation of our personality function would then have to inspect the "language specific data area" I believe (the LSDA) and basically see if the function the personality is running for contains a catch. I believe this is encoded in DWARF somewhere and we just need to read it out, I do not understand the specifics of doing so however.

If we can get both of those done then we the only point where we can't inline is the __rust_maybe_catch_panic boundary, which seems fine as LTO easily erases it.

arielb1 · 2016-07-09T13:54:46Z

Can't you compile servo with -C panic=abort?

jdm · 2016-07-09T15:40:58Z

That would remove a significant reason for Servo to be written in Rust; thread isolation from panics is one of our selling points.

Improve performance of HTMLDivElement constructor These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however. --- - [X] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [X] These changes do not require tests because we don't have performance tests and these are only optimizations  --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/servo/12374)

alexcrichton · 2016-07-12T17:50:04Z

I've updated the description of this issue to list the primary issues to solve to close out this issue. I figure we can leave this open though for discussion about perhaps different strategies for optimization.

alexcrichton · 2016-07-29T18:14:09Z

With @vadimcn's and @cynicaldevil's awesome work, some secret sauce of mine I'll land after @cynicaldevil's PR, and LTO, we can optimize this to a noop:

std::panic::catch_unwind(|| {})

That I believe means we've given LLVM all the inlining opportunities we need, so this should be close to getting fixed. With @cynicaldevil's PR we may not need to tackle cross-crate thread-locals at all!

The previous implementation of this function was overly conservative with liberal usage of `Option` and `.unwrap()` which in theory never triggers. This commit essentially removes the `Option`s in favor of unsafe implementations, improving the code generation of the fast path for LLVM to see through what's happening more clearly. cc rust-lang#34727

… r=brson std: Optimize panic::catch_unwind slightly The previous implementation of this function was overly conservative with liberal usage of `Option` and `.unwrap()` which in theory never triggers. This commit essentially removes the `Option`s in favor of unsafe implementations, improving the code generation of the fast path for LLVM to see through what's happening more clearly. cc rust-lang#34727

alexcrichton · 2016-08-16T16:24:29Z

Current status:

toolchain + flags	time for one `catch_unwind`
stable, no LTO	8ns
stable, LTO	5ns
nightly, no LTO	3ns
nightly, LTO	0ns

That is, I think we're as fast as we're gonna get. Without LTO we can't inline the panic runtime itself (because it's swappable), and in this case we're about as fast as we're gonna get. With LTO, however, we can inline everything and optimize the catch_unwind away entirely.

@jdm when you get a chance could you try re-benchmarking? Is the performance acceptable now?

Amanieu · 2016-08-16T17:17:38Z

We could go one step further and move __rust_maybe_catch_panic into libstd and mark it with #[inline]. This would bring the non-LTO overhead to 0ns as well.

alexcrichton · 2016-08-16T18:13:09Z

I don't think we can do that unfortunately due to how panic runtimes. work. The fact that switching the panic runtime is a compiler switch means that we have to have a strict ABI between the two, which means that #[inline] isn't possible.

Amanieu · 2016-08-16T19:06:43Z

What I mean is that we could use intrinsics::try for both panic runtimes. This will have practically zero cost in the case of panic_abort since the unwind path will never be called. However it will allow the try to be inlined entirely into the calling crate.

alexcrichton · 2016-08-16T20:39:41Z

Perhaps! If we wanna change that ABI then I'd love to solve this issue to get solved even without LTO

jdm · 2016-08-22T19:30:02Z

Instruments.app results are looking much better. I still see 5% of time spent in tlv_get_addr underneath std::panicking::try_call, though, which is worrying. LTO is on by default for optimized build, right?

TimNN · 2016-08-22T19:36:31Z

@jdm: from what http://doc.crates.io/manifest.html#the-profile-sections says, I don't think LTO is enabled in release mode by default.

jdm · 2016-08-22T21:25:33Z

With LTO enabled that tlv_get_addr is gone. Thanks for investigating and fixing this, @alexcrichton and @cynicaldevil!

eddyb · 2016-08-22T21:28:15Z

@alexcrichton @jdm What's the tlv_get_addr call for?
Shouldn't the counter not be accessed at all in the non-panicking case now?

alexcrichton · 2016-08-22T21:58:45Z

@jdm yeah as @eddyb mentioned we shouldn't be hitting that function at all any more unless something actually panics...

jdm · 2016-08-22T22:00:45Z

I've seen tlv_get_addr under call stacks that are Servo's code, rather than directly from try_call, so maybe it was misreported or something.

alexcrichton · 2017-08-10T16:38:44Z

IIRC everything here was handled except for #25088, so closing in favor of #25088

…or (from jdm:jsup); r=Manishearth These changes address two sources of performance loss seen while profiling in #12354. #12358 and rust-lang/rust#34727 are still the biggest offenders, however. --- - [X] `./mach build -d` does not report any errors - [X] `./mach test-tidy` does not report any errors - [X] These changes do not require tests because we don't have performance tests and these are only optimizations Source-Repo: https://github.com/servo/servo Source-Revision: 01ec8491d3200d6710336da1bb0f4e01b49dc4bc UltraBlame original commit: 383b98548e4b848e35322e195dff638072fa2da0

purplesyringa · 2024-08-23T13:07:03Z

What is the evidence that throwing a panic inside a cleanup pad is unsound on MSVC? I could not find this in MSVC docs and I'm wondering what this claim is based on.

jdm mentioned this issue Jul 8, 2016

document.createElement very slow. servo/servo#12354

Open

jdm mentioned this issue Jul 10, 2016

Improve performance of HTMLDivElement constructor servo/servo#12374

Merged

3 tasks

brson added I-slow Issue: Problems and improvements with respect to performance of generated code. A-libs A-runtime Area: std's runtime and "pre-main" init for handling backtraces, unwinds, stack overflows labels Jul 11, 2016

This was referenced Jul 12, 2016

Use only one personality function instead of 2 on Unix #34786

Closed

Don't hit thread-local-storage when entering catch_unwind #34787

Closed

alexcrichton mentioned this issue Aug 7, 2016

std: Optimize panic::catch_unwind slightly #35444

Merged

steveklabnik added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. and removed A-libs labels Mar 24, 2017

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 25, 2017

alexcrichton closed this as completed Aug 10, 2017

Mark-Simulacrum mentioned this issue Dec 23, 2019

Why is the code size of catch_unwind so large ? #64224

Closed

AdrianEddy mentioned this issue Aug 9, 2024

Enable panic handling in release mode? AdrianEddy/after-effects#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catching panics is eating into Servo's DOM performance #34727

Catching panics is eating into Servo's DOM performance #34727

jdm commented Jul 8, 2016 •

edited by alexcrichton

Loading

eefriedman commented Jul 8, 2016

jdm commented Jul 9, 2016

alexcrichton commented Jul 9, 2016

alexcrichton commented Jul 9, 2016

arielb1 commented Jul 9, 2016

jdm commented Jul 9, 2016

alexcrichton commented Jul 12, 2016 •

edited

Loading

alexcrichton commented Jul 29, 2016

alexcrichton commented Aug 16, 2016 •

edited

Loading

Amanieu commented Aug 16, 2016

alexcrichton commented Aug 16, 2016

Amanieu commented Aug 16, 2016

alexcrichton commented Aug 16, 2016

jdm commented Aug 22, 2016

TimNN commented Aug 22, 2016

jdm commented Aug 22, 2016

eddyb commented Aug 22, 2016

alexcrichton commented Aug 22, 2016

jdm commented Aug 22, 2016

alexcrichton commented Aug 10, 2017

purplesyringa commented Aug 23, 2024

Catching panics is eating into Servo's DOM performance #34727

Catching panics is eating into Servo's DOM performance #34727

Comments

jdm commented Jul 8, 2016 • edited by alexcrichton Loading

Issues to solve

eefriedman commented Jul 8, 2016

jdm commented Jul 9, 2016

alexcrichton commented Jul 9, 2016

alexcrichton commented Jul 9, 2016

arielb1 commented Jul 9, 2016

jdm commented Jul 9, 2016

alexcrichton commented Jul 12, 2016 • edited Loading

alexcrichton commented Jul 29, 2016

alexcrichton commented Aug 16, 2016 • edited Loading

Amanieu commented Aug 16, 2016

alexcrichton commented Aug 16, 2016

Amanieu commented Aug 16, 2016

alexcrichton commented Aug 16, 2016

jdm commented Aug 22, 2016

TimNN commented Aug 22, 2016

jdm commented Aug 22, 2016

eddyb commented Aug 22, 2016

alexcrichton commented Aug 22, 2016

jdm commented Aug 22, 2016

alexcrichton commented Aug 10, 2017

purplesyringa commented Aug 23, 2024

jdm commented Jul 8, 2016 •

edited by alexcrichton

Loading

alexcrichton commented Jul 12, 2016 •

edited

Loading

alexcrichton commented Aug 16, 2016 •

edited

Loading