Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression in nightly #34891

Closed
dragostis opened this issue Jul 17, 2016 · 18 comments
Closed

Performance regression in nightly #34891

dragostis opened this issue Jul 17, 2016 · 18 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@dragostis
Copy link

Running cargo bench in pest on nightly-2016-07-02-x86_64-apple-darwin:

test data ... bench:      15,338 ns/iter (+/- 5,197)

On current nightly or nightly-2016-07-10-x86_64-apple-darwin:

test data ... bench:      28,605 ns/iter (+/- 7,901)
@TimNN
Copy link
Contributor

TimNN commented Jul 17, 2016

I can reproduce this (also on os x), just as another data point,nightly-2016-07-07 is still good

@jonas-schievink
Copy link
Contributor

jonas-schievink commented Jul 17, 2016

Can confirm on x86_64 Linux, nightly-2016-07-12 is already bad why the hell did i test 12

@TimNN
Copy link
Contributor

TimNN commented Jul 17, 2016

Enabling LTO has no effect on 07-07 but 07-10 are now only ~2000ns worse.

@TimNN
Copy link
Contributor

TimNN commented Jul 17, 2016

Also with optimisations completely disabled no performance difference is noticeable.

@jonas-schievink
Copy link
Contributor

I think this happened between nightly-2016-07-08 and nightly-2016-07-09:

~/d/pest master> rustup run nightly-2016-07-08 cargo bench
     Running target/release/json-bfd0131a8850b71b

running 1 test
test data ... bench:      10,872 ns/iter (+/- 823)

~/d/pest master> rustup run nightly-2016-07-09 cargo bench
     Running target/release/json-bfd0131a8850b71b

running 1 test
test data ... bench:      17,845 ns/iter (+/- 1,472)

@TimNN
Copy link
Contributor

TimNN commented Jul 17, 2016

So, I'm by no means an llvm expert, but from staring at some ir I believe that one of the differences (maybe even the key difference) is that before the regression the benchmarking closure gets inlined and afterwards it does not.

@nagisa
Copy link
Member

nagisa commented Jul 17, 2016

Similar regression range to #34831.

As far as optimisations go, they usually have some sort of fuel supply in order to not take forever. It is likely this particular inlining/optimisation (lack of which caused the regression) simply didn’t get the consideration.

Either way, I’m not aware of any translator changes that occurred in that date range (OTOH, I haven’t been tracking the PRs during then closely either), nor can I see any LLVM-ups having been done, so I’m very stumped.

@jonas-schievink
Copy link
Contributor

I'm having a hard time using git, but #33890 seems to be in nightly-2016-07-09. I originally suspected #34728, but I don't think that landed in time. At least git isn't listing it in my search. 🐳

@nagisa
Copy link
Member

nagisa commented Jul 17, 2016

That’s a viable candidate, yes. It is quite a big change in how we translate our functions.

Is codegen-units ≠ 1 used by anybody here?

cc @michaelwoerister I wonder whether weak_odr linkage could prevent inlining or other cross-function optimisations for some reason (other than the function definition being in another unit, of course).

@TimNN
Copy link
Contributor

TimNN commented Jul 17, 2016

@jonas-schievink: I came to the same conclusion (#33890 is the likeliest suspect, #34728 is not included in 2016-07-09).

@nagisa: I'm not setting codegen-units explicitly anywhere so I assume the cargo default of 1 is used.

@nagisa
Copy link
Member

nagisa commented Jul 18, 2016

Yes, it seems like weak_odr prevents inlining opportunities. All of the closures ended up not being inlined due to having weak_odr – changing them to internal linkage promptly inlined all of them.

@michaelwoerister
Copy link
Member

I haven't had time to look into this yet, but I'll do so shortly. If weak_odr is that bad for inlining, we should avoid it (note that there is #34830, which does so on MingW only, but could easily be adapted to do so on all platforms).

However, we had been using weak_odr for generics for months without anybody noticing that marked a performance drop and there is the internalize_symbols pass that switches symbols to internal linkage where possible (which should be the case for most closures).

@michaelwoerister
Copy link
Member

I can confirm that current master is still slow compared to the nightly mentioned above (on Linux x86_64). However, comparing the LLVM IR of both versions, it does not seem likely that weak_odr linkage has something to do with the slowdown: The slower version contains only one weak_odr function -- and that isn't called anywhere.

@sanxiyn sanxiyn added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jul 18, 2016
@michaelwoerister
Copy link
Member

I ignore that last comment, I was accidentally looking at the IR of the library, not of the benchmark-executable.

bors added a commit that referenced this issue Jul 18, 2016
…=eddyb

Run base::internalize_symbols() even for single-codegen-unit crates.

The initial linkage-assignment (especially for closures) is a conservative one that makes some symbols more visible than they need to be. While this is not a correctness problem, it does force the LLVM inliner to be more conservative too, which results in poor performance. Once translation is based solely on MIR, it will be easier to also make the initial linkage assignment a better fitting one. Until then `internalize_symbols()` does a good job of preventing most performance regressions.

This should solve the regressions reported in #34891 and maybe also those in #34831.

As a side-effect, this will also solve most of the problematic cases described in #34793. Not reliably so, however. For that, we still need a solution like the one implement in #34830.

cc @rust-lang/compiler
@pmarcelll
Copy link
Contributor

It seems like the new nightly fixed this.

@bstrie bstrie closed this as completed Jul 26, 2016
@eddyb
Copy link
Member

eddyb commented Jul 26, 2016

@pmarcelll What nightly? https://static.rust-lang.org/dist/index.html doesn't list anything newer than 2016-07-21.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Jul 26, 2016

That's strange. rustup update gave me a nightly with a commit-date of June 24 (9316ae5) and the dashboard picked up a new nightly as well.

@eddyb
Copy link
Member

eddyb commented Jul 26, 2016

@rkruppe Ah, @alexcrichton informed me that it was a manual build and not enough was updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

10 participants