Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exponential compile-time and type_length_limit blowup when nesting closure wrappers #54540

Closed
vorner opened this issue Sep 24, 2018 · 25 comments
Closed
Assignees
Labels
A-closures Area: Closures (`|…| { … }`) I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@vorner
Copy link
Contributor

vorner commented Sep 24, 2018

I think this is better explained by a code snippet:

#![recursion_limit="128"]
#![type_length_limit="67108864"]

fn main() {
    let s = S19::default();
    s.call_me((), &|()| ());
}

trait CallMe<T: Copy> {
    fn call_me<F: Fn(T)>(&self, t: T, f: &F);
}

impl<T: Copy> CallMe<T> for () {
    fn call_me<F: Fn(T)>(&self, _: T, _: &F) {}
}

#[derive(Default, Debug)]
struct S<T>(T, T);

impl<T: Copy, P: CallMe<T>> CallMe<T> for S<P> {
    fn call_me<F: Fn(T)>(&self, t: T, f: &F) {
        let wrapped = |t: _| {
            f(t);
        };
        self.0.call_me(t, &wrapped);
        self.1.call_me(t, &wrapped);
    }
}

type S0 = S<()>;
type S1 = S<S0>;
type S2 = S<S1>;
type S3 = S<S2>;
type S4 = S<S3>;
type S5 = S<S4>;
type S6 = S<S5>;
type S7 = S<S6>;
type S8 = S<S7>;
type S9 = S<S8>;
type S10 = S<S9>;
type S11 = S<S10>;
type S12 = S<S11>;
type S13 = S<S12>;
type S14 = S<S13>;
type S15 = S<S14>;
type S16 = S<S15>;
type S17 = S<S16>;
type S18 = S<S17>;
type S19 = S<S18>;

What it does is it creates a closure on each level, that wraps another received closure. With the number of levels the compile time and the needed type_length_limit grows exponentially.

However, if I change the two calls to:

self.0.call_me(t, &f);
self.1.call_me(t, &f);

then the blow-up goes away and it compiles almost instantaneously (I tried it up to 120 levels, without increasing the type length limit), so nesting the S structures like this is OK. But if I change only one of the calls (or even remove it) and leave just one call to the wrapper, the blowup is still there. However, there seems to be nothing around the wrapper that should make it exponential ‒ it's contains just one call to the wrapped f, has no captures, and if I look at the code, I still count only one distinct closure on each level.

The times I've measured with different number of levels:

11: 0.5
12: 0.74
13: 0.97
14: 1.56
15: 2.66
16: 5.02
17: 9.16
18: 18.50
19: 37.72

This is happening both on stable and nightly:

rustc 1.30.0-nightly (63c75d375 2018-09-21)
binary: rustc
commit-hash: 63c75d375d76d0bcc586f9cb5520ced24bc58450
commit-date: 2018-09-21
host: x86_64-unknown-linux-gnu
release: 1.30.0-nightly
LLVM version: 8.0

@ishitatsuyuki mentioned at IRC that he would want to have a look.

@vorner
Copy link
Contributor Author

vorner commented Sep 24, 2018

And another thing I've noticed. cargo check does not trigger the limits and is fast no matter how many levels I put there (but I guess these limits are for the phases cargo check doesn't do).

@eddyb
Copy link
Member

eddyb commented Sep 24, 2018

cc #54175 @nikomatsakis

@ishitatsuyuki
Copy link
Contributor

So well, type_length_limit is a monomorphization time thing and it's used for symbol name (lengths). It's no wonder that the blowup happened because you actually raised the type_length_limit.

As type aliases are identical to the original type, we can't generate things like debug symbols of a reasonable length in this case. We have to combine both caching and digesting to make the symbol shorter.

@vorner
Copy link
Contributor Author

vorner commented Sep 25, 2018

How is it possible then that if I don't call the wrapper, but f, no limit is triggered? It still needs to create the symbol names for the call_me method for each S with a very long name. What I'm trying to point out here is, it all works as a charm without the wrapper closure, but that closure doesn't bring anything special to the table, so why does it make a difference?

And I still don't think the compiler would take 30 seconds to create 20 or so symbol names, each 6 MB long.

What I'm trying to say ‒ I understand than if the bigger limit is needed, it takes longer. But I think the bigger limit should not be needed here. Or, at least, I don't see why it should be.

@estebank estebank added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Sep 25, 2018
@eddyb
Copy link
Member

eddyb commented Sep 26, 2018

@vorner We don't put types in symbol names except for the X in <X as Trait>::method, so I think that's likely what the closure is doing here.

If we move to proper C++-like mangling, we can use the compression features and bypass this problem entirely.

EDIT: I likely was wrong, see below.

@ishitatsuyuki ishitatsuyuki added I-compiletime Issue: Problems and improvements with respect to compile times. and removed I-slow Issue: Problems and improvements with respect to performance of generated code. labels Sep 26, 2018
@ishitatsuyuki
Copy link
Contributor

Triage, Pending Pre-RFC that aims to solve this: https://internals.rust-lang.org/t/pre-rfc-a-new-symbol-mangling-scheme

@eddyb eddyb added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 21, 2019
@eddyb
Copy link
Member

eddyb commented Apr 21, 2019

After working on the parts of rustc relevant to this issue, I'm pretty sure none of those types are getting into symbol names, but rather this is type_length_limit's walk over the generic arguments:

let type_length = instance.substs.types().flat_map(|ty| ty.walk()).count();

The reason closures trigger this is because F is both a type parameter of the closure and a capture, so the walk above doubles itself at every level (and it's breadth-first, so I assume its memory allocations are responsible for the slowdown).

We can trigger the type_length_limit behavior for closures with (F, F), too, and so I managed to reproduce the slowdown without closures (or really any calls):

(click to open)
#![type_length_limit="8388607"]

pub const MAIN: fn() = <S21 as Exp<()>>::exp as fn();

trait Exp<X> {
    fn exp();
}

impl<X> Exp<X> for () {
    fn exp() {}
}

impl<T: Exp<(X, X)>, X> Exp<X> for (T,) {
    fn exp() {
        <T as Exp<(X, X)>>::exp as fn();
    }
}

type S<T> = (T,);
type S0 = S<()>;
type S1 = S<S0>;
type S2 = S<S1>;
type S3 = S<S2>;
type S4 = S<S3>;
type S5 = S<S4>;
type S6 = S<S5>;
type S7 = S<S6>;
type S8 = S<S7>;
type S9 = S<S8>;
type S10 = S<S9>;
type S11 = S<S10>;
type S12 = S<S11>;
type S13 = S<S12>;
type S14 = S<S13>;
type S15 = S<S14>;
type S16 = S<S15>;
type S17 = S<S16>;
type S18 = S<S17>;
type S19 = S<S18>;
type S20 = S<S19>;
type S21 = S<S20>;

Note that a downside of using (X, X) instead of a closure is that tuples are actually exponential in debuginfo, unlike closures (which only show their captures), so you either need to compile unoptimized w/o debuginfo, or in release mode (e.g. on the playground - it'd be nice if it also had the former, cc @lqd).


What can you (@vorner) do?

Not much, I'm afraid you can't avoid this issue, if you need a closure with a capture.
However, if you don't need the Fn() trait (or at least not for the code that contains the wrapper closure), you can build your own Fn-like trait, and instead of a closure you can have a struct which is only parameterized by F once (unlike the closure, which ends up doing it twice).
Or you can wait for one of the fixes below (but the issue will, of course, persist in older versions).

What can we do?

For type_length_limit specifically, it could control the walk more directly, and cache the sizes of subtrees, which should reduce the cost for this to something linear.
I am, however, worried this may be inefficient for most "type lengths", which are probably going to be relatively small anyway, and where the current walk is likely pretty cheap.
IIRC type_length_limit was added to avoid the compiler spending a lot of time without appearing to do anything, I wonder if that could be solved some other way?

For closure captures, maybe they should refer to their parameters abstractly, so we never have to substitute captures (until we need to e.g. compute layout)? Or even keep captures outside of the type itself (had we done this and found it inefficient?)
cc @rust-lang/wg-traits @rust-lang/wg-compiler-nll @arielb1

I'm nominating this issue for discussion at the next compiler team meeting, wrt the last two points above (type_length_limit and closure captures).

@pnkfelix
Copy link
Member

pnkfelix commented May 2, 2019

triage: tagging as P-medium, under assumption that the exponential time blowup here is only observable if the user "opts in" via type_length_limit with large values. Leaving nominated for discussion since, well, it still needs to be discussed.

If the prior assumption is invalid, feel free to challenge that priority assignment.

@pnkfelix
Copy link
Member

assigning to self to try to resolve this, hopefully with @eddyb input. Un-nominating.

@pnkfelix pnkfelix self-assigned this May 16, 2019
@ryankurte
Copy link

i'm also seeing this issue using closures with futures where .and_then(|r| r.do() ) is reasonably unavoidable :-/

are there any tricks that might avoid the issue in this case, or even, any useful way of tracking down where the the compiler is struggling in a large async application?

@eddyb
Copy link
Member

eddyb commented May 22, 2019

@ryankurte There's not much you can do, the compiler is most likely struggling to compute the unnecessarily large type_length_limit number (not doing anything useful), and until that's fixed:

  • if you can, I'd highly suggest moving to async fn - that should be a net benefit
  • you can insert .boxed() calls strategically to avoid creating monstrously large future types

@ryankurte
Copy link

thanks for the boxed hint, i can compile (slowly) again now ^_^
last i tried async fn the interop was broken and there’s no way i can swap everything all at once, will have another look when i have a moment.

@gakonst
Copy link

gakonst commented Aug 12, 2019

i'm also seeing this issue using closures with futures where .and_then(|r| r.do() ) is reasonably unavoidable :-/

are there any tricks that might avoid the issue in this case, or even, any useful way of tracking down where the the compiler is struggling in a large async application?

I ended up Box'ing most futures instead of boxed() since it's deprecated. Compiling time increased. How would async syntax help here?

@eddyb
Copy link
Member

eddyb commented Aug 12, 2019

How would async syntax help here?

async fn would simply not have all of these deep types (they come from the closures passed to future combinators).

@jonas-schievink
Copy link
Contributor

Sounds like #72412 might help here

@tmandry
Copy link
Member

tmandry commented Sep 8, 2020

To repeat a comment I made on #64496:

@eddyb has said we should just get rid of this check. @nikomatsakis I'm curious to hear your thoughts on that. Personally I'm not sure; it seems like this can point to areas where compile time etc. would be improved by introducing a boxed future, but it doesn't necessarily do a good job of that.

For reference, here's a change to raise all the limits we needed to in Fuchsia. Some of the limits are quite high (the highest I see is over 18 million). I will say that all of these are in related parts of the code, which tells me there might be a common denominator. I'm not aware of an easy way of finding it if there is, though. And people working in that part of the code will be left with juggling these arbitrary-seeming limits.

The compile fails immediately on hitting one of these errors, and the offending type may not be the biggest one in the compile, creating the really unfortunate experience of updating the limit only to have it fail again. At one point I just started increasing it myself by arbitrary amounts over what the compiler suggested.

@tmandry
Copy link
Member

tmandry commented Sep 19, 2020

The compile-time aspect of this was fixed in #72412.

@djc
Copy link
Contributor

djc commented Sep 19, 2020

(And the type length limit checks are being worked on in #76772.)

@link2xt
Copy link

link2xt commented Sep 19, 2020

In case of deltachat the type_length_limit is gone on nightly: deltachat/deltachat-core-rust#1873
I assume #72412 reduced the lengths too?

Why didn't crater catch this bug, does this mean type_length_limit wasn't triggered for any crates published on crates.io? Would publishing on crates.io have helped to detect the bug earlier, before 1.46.0 release?

@tmandry
Copy link
Member

tmandry commented Sep 19, 2020

Ah sorry for the confusion, we only walk each unique type once when measuring the type length now, so it will have an effect on that too.

@djc
Copy link
Contributor

djc commented Sep 21, 2020

Confirmed that current nightly doesn't require the type length limit increase for Quinn anymore.

martinetd added a commit to martinetd/matrix-rust-sdk that referenced this issue Oct 8, 2020
Some examples no longer build after the following commits, set a
bigger-than-default type_length_limit to let tests pass.

The exceptions are not necessary on nightly and can be removed again
after rust-lang/rust#54540 is fixed.
drahnr added a commit to paritytech/polkadot that referenced this issue Dec 4, 2020
@tmandry
Copy link
Member

tmandry commented Mar 12, 2021

Given discussion above, closing and tracking future issues in #83031.

@tmandry tmandry closed this as completed Mar 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-closures Area: Closures (`|…| { … }`) I-compiletime Issue: Problems and improvements with respect to compile times. P-medium Medium priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
Archived in project
Development

No branches or pull requests