Optimize `Substs::super_fold_with`. #37108

nnethercote · 2016-10-12T09:39:43Z

This speeds up some of the rustc-benchmarks by up to ~4%.

futures-rs-test  4.467s vs  4.387s --> 1.018x faster (variance: 1.001x, 1.006x)
helloworld       0.242s vs  0.246s --> 0.980x faster (variance: 1.007x, 1.013x)
html5ever-2016-  7.664s vs  7.630s --> 1.004x faster (variance: 1.008x, 1.006x)
hyper.0.5.0      5.218s vs  5.133s --> 1.016x faster (variance: 1.013x, 1.008x)
inflate-0.1.0    5.040s vs  5.103s --> 0.988x faster (variance: 1.005x, 1.008x)
issue-32062-equ  0.361s vs  0.345s --> 1.047x faster (variance: 1.008x, 1.019x)
issue-32278-big  1.874s vs  1.850s --> 1.013x faster (variance: 1.020x, 1.018x)
jld-day15-parse  1.569s vs  1.508s --> 1.040x faster (variance: 1.009x, 1.003x)
piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x)
regex.0.1.30     2.568s vs  2.555s --> 1.005x faster (variance: 1.018x, 1.044x)
rust-encoding-0  2.139s vs  2.135s --> 1.001x faster (variance: 1.012x, 1.005x)
syntex-0.42.2   33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x)
syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x)

r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.

KalitaAlexey · 2016-10-12T09:55:42Z

src/librustc/ty/subst.rs

+        //   and instead reuse the existing substs.
+        // - In that case, when the substs length is one, we also avoid
+        //   creating a new Vec, which avoids a heap allocation.
+        let params;


params is used only in 338 line block. Can you move into the block?

No. It's used on lines 336, 338, 349.

KalitaAlexey · 2016-10-12T09:58:09Z

src/librustc/ty/subst.rs

+        // - In that case, when the substs length is one, we also avoid
+        //   creating a new Vec, which avoids a heap allocation.
+        let params;
+        if self.params.len() == 1 {


Can you rewrite it to match self.params.len()?

I prefer having it this way. The 1 case is most frequent, then >1, and 0 is a distant last. With a match the ordering is less clear (i.e. not guaranteed).

I can add a comment to that effect, if it helps.

@nnethercote Okay. I got it.

bluss · 2016-10-12T10:21:47Z

src/librustc/ty/subst.rs

+                return &self;
+            }
+        } else {
+            return &self;


super minor, but the & in &self is redundant. The Self type is &'tcx Substs<'tcx> and the self identifier is of type &Self. Deref coercion means that using &self, self or *self here is going to be equivalent.

eddyb · 2016-10-12T12:04:36Z

So this is an interesting take. I'd like to do it after moving to on-stack arrays and slice arena allocations instead of always allocating a Vec. Then it could either be an extension of that algorithm or maybe it wouldn't even have a measurable impact etc

eddyb · 2016-10-12T12:05:55Z

Oh and this doesn't handle ty::relate which may have a larger impact.

nnethercote · 2016-10-12T19:58:56Z

I'd like to do it after moving to on-stack arrays and slice arena allocations instead of always allocating a Vec.

My original version of the patch just checked if params == self.params and skipped the mk_substs call in that case, and that alone was a decent improvement because it avoids the interning, which is dominated by hash operations.

Then I added the skip-Vec-creation-in-the-length-1 case and that improved things a bit more.

nnethercote · 2016-10-12T22:22:07Z

Oh and this doesn't handle ty::relate which may have a larger impact.

I haven't seen it in any of the profiles I've looked at. Can you describe in more detail how you think it might have an impact?

bors · 2016-10-15T16:40:39Z

☔ The latest upstream changes (presumably #37100) made this pull request unmergeable. Please resolve the merge conflicts.

This speeds up several rustc-benchmarks by 1--4%.

nnethercote · 2016-10-18T09:59:23Z

I have changed the code to just do the "skip mk_substs if the fold was a no-op" optimisation. That should avoid any conflicts with @Mark-Simulacrum's changes to Substs. It still gives some nice speed-ups.

futures-rs-test  4.189s vs  4.157s --> 1.008x faster (variance: 1.004x, 1.007x)
helloworld       0.223s vs  0.223s --> 1.000x faster (variance: 1.013x, 1.012x)
html5ever-2016-  5.192s vs  5.147s --> 1.009x faster (variance: 1.007x, 1.004x)
hyper.0.5.0      4.788s vs  4.763s --> 1.005x faster (variance: 1.008x, 1.008x)
inflate-0.1.0    4.553s vs  4.494s --> 1.013x faster (variance: 1.010x, 1.044x)
issue-32062-equ  0.324s vs  0.314s --> 1.032x faster (variance: 1.008x, 1.020x)
issue-32278-big  1.661s vs  1.654s --> 1.005x faster (variance: 1.008x, 1.002x)
jld-day15-parse  1.475s vs  1.422s --> 1.038x faster (variance: 1.007x, 1.008x)
piston-image-0. 11.386s vs 11.227s --> 1.014x faster (variance: 1.005x, 1.063x)
reddit-stress    2.359s vs  2.308s --> 1.022x faster (variance: 1.017x, 1.024x)
regex.0.1.30     2.421s vs  2.388s --> 1.014x faster (variance: 1.036x, 1.008x)
rust-encoding-0  1.983s vs  1.973s --> 1.005x faster (variance: 1.008x, 1.016x)
syntex-0.42.2   29.582s vs 29.610s --> 0.999x faster (variance: 1.014x, 1.045x)
syntex-0.42.2-i 15.203s vs 15.175s --> 1.002x faster (variance: 1.011x, 1.006x)

r? @eddyb

Mark-Simulacrum · 2016-10-18T11:49:30Z

Note that it will be at least a few days, more likely a week until I can resume work on the Substs improvements, so I'm not sure removing things now is a good idea. I haven't followed this PR closely though, not sure how large/significant the changes here are.

eddyb · 2016-10-18T15:21:44Z

src/librustc/ty/subst.rs

+        let params: Vec<_> = self.iter().map(|k| k.fold_with(folder)).collect();
+
+        // If folding doesn't change the substs, it's faster to avoid calling
+        // `mk_substs` and instead reuse the existing substs.


Hmm, maybe open an issue about doing this in general, or at least further investigation? That is, there's a bunch of places where we can short-circuit the interner, maybe some are worth doing.

eddyb · 2016-10-18T15:22:09Z

@bors r+

bors · 2016-10-18T15:22:10Z

📌 Commit 1e4241a has been approved by eddyb

bors · 2016-10-18T15:28:52Z

⌛ Testing commit 1e4241a with merge eeb6379...

alexcrichton · 2016-10-18T22:12:21Z

@bors: retry force clean

restarted buildbot

Just kidding I'm doing this only to unstuck @bors/homu/buildbot.

eddyb · 2016-10-19T04:38:35Z

@bors r+

bors · 2016-10-19T04:38:36Z

📌 Commit ab5dcff has been approved by eddyb

@eddyb

… r=eddyb Optimize `Substs::super_fold_with`. This speeds up some of the rustc-benchmarks by up to ~4%. ``` futures-rs-test 4.467s vs 4.387s --> 1.018x faster (variance: 1.001x, 1.006x) helloworld 0.242s vs 0.246s --> 0.980x faster (variance: 1.007x, 1.013x) html5ever-2016- 7.664s vs 7.630s --> 1.004x faster (variance: 1.008x, 1.006x) hyper.0.5.0 5.218s vs 5.133s --> 1.016x faster (variance: 1.013x, 1.008x) inflate-0.1.0 5.040s vs 5.103s --> 0.988x faster (variance: 1.005x, 1.008x) issue-32062-equ 0.361s vs 0.345s --> 1.047x faster (variance: 1.008x, 1.019x) issue-32278-big 1.874s vs 1.850s --> 1.013x faster (variance: 1.020x, 1.018x) jld-day15-parse 1.569s vs 1.508s --> 1.040x faster (variance: 1.009x, 1.003x) piston-image-0. 12.210s vs 11.903s --> 1.026x faster (variance: 1.045x, 1.010x) regex.0.1.30 2.568s vs 2.555s --> 1.005x faster (variance: 1.018x, 1.044x) rust-encoding-0 2.139s vs 2.135s --> 1.001x faster (variance: 1.012x, 1.005x) syntex-0.42.2 33.099s vs 32.353s --> 1.023x faster (variance: 1.003x, 1.028x) syntex-0.42.2-i 17.989s vs 17.431s --> 1.032x faster (variance: 1.009x, 1.018x) ``` r? @eddyb. I don't know how this interacts with the changes that dikaiosune has been working on.

Rollup of 23 pull requests - Successful merges: #36964, #37108, #37117, #37124, #37161, #37176, #37182, #37193, #37198, #37202, #37208, #37218, #37221, #37224, #37230, #37231, #37233, #37236, #37240, #37254, #37257, #37265, #37267 - Failed merges: #37213, #37220, #37261

Avoid more unnecessary mk_ty calls in Ty::super_fold_with. This speeds up several rustc-benchmarks by 1--5%. This PR is the lovechild of #37108 and #37705. ``` futures-rs-test 4.059s vs 4.011s --> 1.012x faster (variance: 1.016x, 1.026x) helloworld 0.236s vs 0.239s --> 0.986x faster (variance: 1.051x, 1.014x) html5ever-2016- 3.831s vs 3.824s --> 1.002x faster (variance: 1.020x, 1.019x) hyper.0.5.0 4.928s vs 4.936s --> 0.998x faster (variance: 1.003x, 1.012x) inflate-0.1.0 4.135s vs 4.104s --> 1.007x faster (variance: 1.026x, 1.028x) issue-32062-equ 0.309s vs 0.303s --> 1.017x faster (variance: 1.019x, 1.084x) issue-32278-big 1.818s vs 1.797s --> 1.011x faster (variance: 1.011x, 1.008x) jld-day15-parse 1.304s vs 1.271s --> 1.026x faster (variance: 1.018x, 1.012x) piston-image-0. 10.938s vs 10.921s --> 1.002x faster (variance: 1.025x, 1.016x) reddit-stress 2.327s vs 2.208s --> 1.054x faster (variance: 1.016x, 1.006x) regex-0.1.80 8.796s vs 8.727s --> 1.008x faster (variance: 1.012x, 1.019x) regex.0.1.30 2.294s vs 2.249s --> 1.020x faster (variance: 1.013x, 1.026x) rust-encoding-0 1.914s vs 1.886s --> 1.015x faster (variance: 1.027x, 1.026x) ```

rust-highfive assigned eddyb Oct 12, 2016

KalitaAlexey reviewed Oct 12, 2016

View reviewed changes

bluss reviewed Oct 12, 2016

View reviewed changes

nnethercote force-pushed the substs-experimentation branch from 7209d3f to 6310135 Compare October 18, 2016 05:45

Optimize Substs::super_fold_with.

1e4241a

This speeds up several rustc-benchmarks by 1--4%.

nnethercote force-pushed the substs-experimentation branch from 6310135 to 1e4241a Compare October 18, 2016 09:55

eddyb approved these changes Oct 18, 2016

View reviewed changes

Format comment in a nicer way.

ab5dcff

Just kidding I'm doing this only to unstuck @bors/homu/buildbot.

eddyb mentioned this pull request Oct 19, 2016

Rollup of 23 pull requests #37269

Merged

bors merged commit ab5dcff into rust-lang:master Oct 19, 2016

nnethercote deleted the substs-experimentation branch October 19, 2016 23:23

brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Oct 24, 2016

nnethercote mentioned this pull request Nov 24, 2016

Avoid more unnecessary mk_ty calls in Ty::super_fold_with. #37979

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize `Substs::super_fold_with`. #37108

Optimize `Substs::super_fold_with`. #37108

nnethercote commented Oct 12, 2016

KalitaAlexey Oct 12, 2016

nnethercote Oct 12, 2016

KalitaAlexey Oct 12, 2016

nnethercote Oct 12, 2016

nnethercote Oct 12, 2016

KalitaAlexey Oct 12, 2016

bluss Oct 12, 2016

eddyb commented Oct 12, 2016

eddyb commented Oct 12, 2016

nnethercote commented Oct 12, 2016

nnethercote commented Oct 12, 2016

bors commented Oct 15, 2016

nnethercote commented Oct 18, 2016

Mark-Simulacrum commented Oct 18, 2016

eddyb Oct 18, 2016

eddyb commented Oct 18, 2016

bors commented Oct 18, 2016

bors commented Oct 18, 2016

alexcrichton commented Oct 18, 2016

eddyb commented Oct 19, 2016

bors commented Oct 19, 2016

Optimize Substs::super_fold_with. #37108

Optimize Substs::super_fold_with. #37108

Conversation

nnethercote commented Oct 12, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eddyb commented Oct 12, 2016

eddyb commented Oct 12, 2016

nnethercote commented Oct 12, 2016

nnethercote commented Oct 12, 2016

bors commented Oct 15, 2016

nnethercote commented Oct 18, 2016

Mark-Simulacrum commented Oct 18, 2016

Choose a reason for hiding this comment

eddyb commented Oct 18, 2016

bors commented Oct 18, 2016

bors commented Oct 18, 2016

alexcrichton commented Oct 18, 2016

eddyb commented Oct 19, 2016

bors commented Oct 19, 2016

Optimize `Substs::super_fold_with`. #37108

Optimize `Substs::super_fold_with`. #37108