NLL performance tracking issue #47879

nikomatsakis · 2018-01-30T14:33:49Z

This is a tracking issue dedicated to discussing ideas and techniques for improving compile-time with the NLL feature. It's meant to house as a repository of measurements, benchmarks, etc for the time being to help coordinate efforts.

Benchmarks and timing runs

We can view performance results on perf.rust-lang.org now. Just look for the "NLL" runs. They can be compared against "clean" runs -- the delta is the 'extra work' we are doing when NLL is enabled (note that NLL still runs the old region analysis and borrow check, so it does strictly more work).

Ideas for improvement or measurement

Introduce a dirty list (Make region inference use a dirty list #47766)
More detailed profiling and analysis of the results above
Measure: what percentage of the time do calls to dfs calls wind up actually adding new info?
Idea: modify dfs code not to invoke successors(), which can allocate, but instead add some kind of non-allocating form (perhaps with a callback?)
Experiment with sparse representation of values (use sparse bitsets instead of dense ones for NLL results #48170)
... others? Discuss below, that's what this issue is for =)

Quick pointers into the source

The main source of time for NLL is probably going to be the code that propagates constraints:

rust/src/librustc_mir/borrow_check/nll/region_infer/mod.rs

Lines 453 to 457 in fe7e1a4

    
           /// Propagate the region constraints: this will grow the values 
        
           /// for each region variable until all the constraints are 
        
           /// satisfied. Note that some values may grow **too** large to be 
        
           /// feasible, but we check this later. 
        
           fn propagate_constraints(&mut self, mir: &Mir<'tcx>) {

And in particular the calls to dfs:

rust/src/librustc_mir/borrow_check/nll/region_infer/mod.rs

Lines 486 to 495 in fe7e1a4

    
           let Ok(made_changes) = self.dfs( 
        
               mir, 
        
               CopyFromSourceToTarget { 
        
                   source_region: constraint.sub, 
        
                   target_region: constraint.sup, 
        
                   inferred_values: &mut inferred_values, 
        
                   constraint_point: constraint.point, 
        
                   constraint_span: constraint.span, 
        
               }, 
        
           );

The dfs code must walk the graph to find which points should be added and where:

rust/src/librustc_mir/borrow_check/nll/region_infer/dfs.rs

Lines 37 to 40 in fe7e1a4

    
           pub(super) fn dfs<C>(&self, mir: &Mir<'tcx>, mut op: C) -> Result<bool, C::Early> 
        
           where 
        
               C: DfsOp, 
        
           {

cc @rust-lang/wg-compiler-nll

The text was updated successfully, but these errors were encountered:

nikomatsakis · 2018-01-30T15:16:55Z

I did some profiling of syn. The top result was actually not what I expected:

+   14.45%    14.45%  rustc    librustc-1dc9e414ba4b7dab.so                  [.] rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data

This is type-checker constraint collection; this may imply that some of the refactorings I had in mind for doing this in a smarter way are in order.

nikomatsakis · 2018-01-30T20:14:06Z

OK, that profiling run was kind of messed up, because I forgot to include debuginfo = 1, so that the framepointers were all wrong etc. Still, when I did a better run, the data was a lot more detailed, but pointed roughly to the same conclusion. The NLL type checker is killing a lot of time doing normalization. This is also responsible for some soundness bugs. This kind of ups the importance of doing the trait refactorings I had in mind to handle this scenario better, but I'd hate to block on this (they are kind of involved). I have to think if we can do them in a kind of lightweight way.

nikomatsakis · 2018-02-01T22:36:06Z

On a hunch, tried the results of @Aaron1011's #47920. This makes a big difference, bringing us down to 7s. Haven't looked at the detailed profiling yet.

nikomatsakis · 2018-02-02T13:33:30Z

Did some rough profiling and analysis. For syn, with #47920 applied:

mir borrowck accounts for 18% of total runtime
- nll accounts for 16% of total run time
  - running the MIR type check accounts for 14% of total run time
    - add_drop_live_constraint accounts for 12% of total run time
  - the other 2% is not concentrated in any one place
    - rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve in particular is 0% (7 samples out of 1034)
for comparison, AST-based borrowck accounts for 2%

nikomatsakis · 2018-02-22T11:33:48Z

So after #48411 lands, this is the current profile of the syn crate:

lunch-box. perf focus  '{do_mir_borrowck}' --tree-callees --tree-min-percent 2
Matcher    : {do_mir_borrowck}
Matches    : 152
Not Matches: 776
Percentage : 16%

Tree
| matched `{do_mir_borrowck}` (16% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (12% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check (10% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check_internal (10% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::haa6458083ae8ea7a (7% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (7% total, 0% self)
: : : : : : | rustc_mir::borrow_check::nll::type_check::TypeChecker::fully_perform_op (5% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::take_and_reset_region_constraints (2% total, 0% self)
: : : : : : : : | rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data (2% total, 0% self)
: : : : : : : : : | <rustc_data_structures::unify::UnificationTable<K>>::new_key (2% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::commit_if_ok (2% total, 0% self)
: : : : : : : : | rustc::traits::query::dropck_outlives::<impl rustc::infer::at::At<'cx, 'gcx, 'tcx>>::dropck_outlives (2% total, 0% self)

This is showing stuff that takes more than 2% of total execution time.

Here is the profile showing stuff that takes more than 1%:

1% profile

lunch-box. perf focus  '{do_mir_borrowck}' --tree-callees --tree-min-percent 1
Matcher    : {do_mir_borrowck}
Matches    : 152
Not Matches: 776
Percentage : 16%

Tree
| matched `{do_mir_borrowck}` (16% total, 0% self)
: | rustc_mir::borrow_check::nll::compute_regions (12% total, 0% self)
: : | rustc_mir::borrow_check::nll::type_check::type_check (10% total, 0% self)
: : : | rustc_mir::borrow_check::nll::type_check::type_check_internal (10% total, 0% self)
: : : : | rustc_mir::borrow_check::nll::type_check::type_check::_$u7b$$u7b$closure$u7d$$u7d$::haa6458083ae8ea7a (7% total, 0% self)
: : : : : | rustc_mir::borrow_check::nll::type_check::liveness::generate (7% total, 0% self)
: : : : : : | rustc_mir::borrow_check::nll::type_check::TypeChecker::fully_perform_op (5% total, 0% self)
: : : : : : : | rustc::infer::InferCtxt::take_and_reset_region_constraints (2% total, 0% self)
: : : : : : : : | rustc::infer::region_constraints::RegionConstraintCollector::take_and_reset_data (2% total, 0% self)
: : : : : : : : : | <rustc_data_structures::unify::UnificationTable<K>>::new_key (2% total, 0% self)
: : : : : : : : : : | <rustc_data_structures::snapshot_vec::SnapshotVec<D>>::push (1% total, 1% self)
: : : : : : : | rustc::infer::InferCtxt::commit_if_ok (2% total, 0% self)
: : : : : : : : | rustc::traits::query::dropck_outlives::<impl rustc::infer::at::At<'cx, 'gcx, 'tcx>>::dropck_outlives (2% total, 0% self)
: : : : : : : : : | rustc::infer::canonical::<impl rustc::infer::InferCtxt<'cx, 'gcx, 'tcx>>::instantiate_query_result (1% total, 0% self)
: : : : : : | <rustc_mir::dataflow::at_location::FlowAtLocation<BD>>::each_state_bit (1% total, 1% self)
: : : : | <rustc_mir::borrow_check::nll::type_check::TypeVerifier<'a, 'b, 'gcx, 'tcx> as rustc::mir::visit::Visitor<'tcx>>::visit_mir (1% total, 0% self)
: : | rustc_mir::borrow_check::nll::region_infer::RegionInferenceContext::solve (1% total, 0% self)

nikomatsakis · 2018-03-23T17:52:55Z

Good news:

The indomitable @Mark-Simulacrum has added a special "NLL mode" to http://perf.rust-lang.org/, so we can now visualize our performance very easily. The NLL point should be compared to "clean" -- the delta is the 'extra work' we are doing when NLL is enabled (note that NLL still runs the old region analysis and borrow check, so it does strictly more work).

Bad news:

We got out work cut out for us! =) See that little orange triangle in the upper right?

nikomatsakis · 2018-06-06T19:10:01Z

I'm gonna close this tracking issue. Not that the problem is solved, but I don't see this issue as adding a lot of value.

nikomatsakis added this to the NLL: Performance is good milestone Jan 30, 2018

nikomatsakis added I-compiletime Issue: Problems and improvements with respect to compile times. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-compiler-nll labels Jan 30, 2018

TimNN added the C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. label Jan 30, 2018

nikomatsakis mentioned this issue Feb 12, 2018

use sparse bitsets instead of dense ones for NLL results #48170

Closed

nikomatsakis mentioned this issue Feb 22, 2018

consider optimizing how NLL region inference works to use bit sets #47542

Closed

nikomatsakis mentioned this issue Mar 14, 2018

measure NLL performance on perf #49026

Closed

nikomatsakis added the NLL-performant Working towards the "performance is good" goal label Mar 14, 2018

pnkfelix mentioned this issue Apr 18, 2018

[nll] compiler spinning forever #49936

Closed

nikomatsakis closed this as completed Jun 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLL performance tracking issue #47879

NLL performance tracking issue #47879

nikomatsakis commented Jan 30, 2018 •

edited

Loading

nikomatsakis commented Jan 30, 2018 •

edited

Loading

nikomatsakis commented Jan 30, 2018

nikomatsakis commented Feb 1, 2018 •

edited

Loading

nikomatsakis commented Feb 2, 2018 •

edited by chrisvittal

Loading

nikomatsakis commented Feb 22, 2018

nikomatsakis commented Mar 23, 2018

nikomatsakis commented Jun 6, 2018

NLL performance tracking issue #47879

NLL performance tracking issue #47879

Comments

nikomatsakis commented Jan 30, 2018 • edited Loading

Benchmarks and timing runs

Ideas for improvement or measurement

Quick pointers into the source

nikomatsakis commented Jan 30, 2018 • edited Loading

nikomatsakis commented Jan 30, 2018

nikomatsakis commented Feb 1, 2018 • edited Loading

nikomatsakis commented Feb 2, 2018 • edited by chrisvittal Loading

nikomatsakis commented Feb 22, 2018

nikomatsakis commented Mar 23, 2018

nikomatsakis commented Jun 6, 2018

nikomatsakis commented Jan 30, 2018 •

edited

Loading

nikomatsakis commented Jan 30, 2018 •

edited

Loading

nikomatsakis commented Feb 1, 2018 •

edited

Loading

nikomatsakis commented Feb 2, 2018 •

edited by chrisvittal

Loading