const_eval: Predetermine the layout of all locals when pushing a stack frame #57677

dotdash · 2019-01-16T19:50:26Z

Usually the layout of any locals is required at least three times, once
when it becomes live, once when it is written to, and once it is read
from. By adding a cache for them, we can reduce the number of layout
queries speeding up code that is heavy on const_eval.

rust-highfive · 2019-01-16T19:50:40Z

r? @michaelwoerister

(rust_highfive has picked a reviewer for you, use r? to override)

dotdash · 2019-01-16T21:46:11Z

@bors try

bors · 2019-01-16T21:46:25Z

⌛ Trying commit fafeda6caed392ac71ac7e2b0ccc873ec86e558a with merge 4f2ef2516cd3f1e875bb589402bf1319967ce3a5...

michaelwoerister · 2019-01-17T11:02:39Z

Thanks for the PR, @dotdash! Looking forward to the perf results :)

michaelwoerister · 2019-01-17T14:26:39Z

@rust-timer build 4f2ef2516cd3f1e875bb589402bf1319967ce3a5
(I think the try build actually finished successfully a while ago)

rust-timer · 2019-01-17T14:26:40Z

Success: Queued 4f2ef2516cd3f1e875bb589402bf1319967ce3a5 with parent ceb2512, comparison URL.

rust-timer · 2019-01-17T17:04:31Z

Finished benchmarking try commit 4f2ef2516cd3f1e875bb589402bf1319967ce3a5

michaelwoerister · 2019-01-17T20:20:07Z

This looks pretty good. For the synthetic const-eval heavy tests it looks spectacular even :)

I'll give it a proper review tomorrow. Meanwhile, could you check whether there's a problem with compiling the unicode_normalization crate? perf.rlo shows no numbers for it which might indicate a compilation error.

cc @oli-obk @rust-lang/wg-compiler-performance

dotdash · 2019-01-17T20:46:43Z

The numbers are only missing for the parent commit. I suppose that it just predates the addition of that benchmark, which was yesterday.

oli-obk · 2019-01-18T07:45:21Z

src/librustc_mir/interpret/eval_context.rs

-        let local_ty = self.monomorphize(local_ty, frame.instance.substs);
-        self.layout_of(local_ty)
+    ) -> TyLayout<'tcx> {
+        frame.local_layouts[local]


While definitely an improvement for all currently possible const eval code, I think this is suboptimal for code with branching, as a lot of locals will never be touched during evaluation. I think it would be better to make local_layouts be IndexVec<mir::Local, Option<TyLayout<'tcx>>> and only fill it in on demand. To make this really work with reading I presume it would even need to be IndexVec<mir::Local, RefCell<Option<TyLayout<'tcx>>>> at which point we can just move to IndexVec<mir::Local, Once<TyLayout<'tcx>>> or some non-Sync variant of that.

oli-obk · 2019-01-18T07:48:17Z

src/librustc_mir/interpret/operand.rs

@@ -463,12 +463,10 @@ impl<'a, 'mir, 'tcx, M: Machine<'a, 'mir, 'tcx>> EvalContext<'a, 'mir, 'tcx, M>
        &self,


drive-by fix: access_local doesn't need to be pub anymore

michaelwoerister · 2019-01-18T09:39:15Z

@oli-obk's remark regarding branching code makes sense. Would it be possible to make some or most EvalContext methods &mut self instead of &self so we don't have to use RefCell? It seems like that should be possible in principle, right?

oli-obk · 2019-01-18T10:01:54Z

Would it be possible to make some or most EvalContext methods &mut self instead of &self so we don't have to use RefCell? It seems like that should be possible in principle, right?

The problem is that we separated reading and writing into &mut self and &self methods. So any method trying to read from the local would need to be &mut self and that would infect everything. The immutability of the read paths has been very helpful in the development of miri. I think it would be fine to have some interior mutability just for memoization. Since TyLayout is Copy, we can even use Cell<Option<TyLayout<'tcx>>> making the entire operation zero cost both in memory and performance.

michaelwoerister · 2019-01-18T10:52:54Z

OK, I'm fine with the Cell approach.

I'd be interested though in how using &self has made things easier.

oli-obk · 2019-01-18T11:02:28Z

I'd be interested though in how using &self has made things easier.

Mirroring the mutability of the virtual miri memory in the mutability of the memory datastructure itself allows various things like holding references to different parts of the virtual memory at the same time. If we made everything take mutable references, we could only ever access one miri allocation, even if we just wanted to write to one. While we could have worked around this with unsafe code (and we still have to do it in one situation

rust/src/librustc_mir/interpret/memory.rs

Lines 725 to 763 in 38650b6

    
           // This checks relocation edges on the src. 
        
           let src_bytes = self.get(src.alloc_id)? 
        
               .get_bytes_with_undef_and_ptr(&tcx, src, size)? 
        
               .as_ptr(); 
        
           let dest_bytes = self.get_mut(dest.alloc_id)? 
        
               .get_bytes_mut(&tcx, dest, size * length)? 
        
               .as_mut_ptr(); 
        
           // SAFE: The above indexing would have panicked if there weren't at least `size` bytes 
        
           // behind `src` and `dest`. Also, we use the overlapping-safe `ptr::copy` if `src` and 
        
           // `dest` could possibly overlap. 
        
           // The pointers above remain valid even if the `HashMap` table is moved around because they 
        
           // point into the `Vec` storing the bytes. 
        
           unsafe { 
        
               assert_eq!(size.bytes() as usize as u64, size.bytes()); 
        
               if src.alloc_id == dest.alloc_id { 
        
                   if nonoverlapping { 
        
                       if (src.offset <= dest.offset && src.offset + size > dest.offset) || 
        
                           (dest.offset <= src.offset && dest.offset + size > src.offset) 
        
                       { 
        
                           return err!(Intrinsic( 
        
                               "copy_nonoverlapping called on overlapping ranges".to_string(), 
        
                           )); 
        
                       } 
        
                   } 
        
                   for i in 0..length { 
        
                       ptr::copy(src_bytes, 
        
                                 dest_bytes.offset((size.bytes() * i) as isize), 
        
                                 size.bytes() as usize); 
        
                   } 
        
               } else { 
        
                   for i in 0..length { 
        
                       ptr::copy_nonoverlapping(src_bytes, 
        
                                                dest_bytes.offset((size.bytes() * i) as isize), 
        
                                                size.bytes() as usize); 
        
                   } 
        
               } 
        
           }

), and while two phase borrows could have made many situations safe, we would still have to fight the borrow checker at ever turn.

Also it just "felt right" to make sure we don't accidentally modify our virtual memory in functions that should just read from it.

michaelwoerister · 2019-01-18T13:19:13Z

Yeah, I can see how this can become tricky. Thanks for the explanation, @oli-obk!

dotdash · 2019-01-19T15:23:05Z

using Cell gets me lifetime errors, which I don't understand right away.

  --> src/librustc_mir/interpret/snapshot.rs:75:34
   |
46 | impl<'a, 'mir, 'tcx> InfiniteLoopDetector<'a, 'mir, 'tcx>
   |      --  ---- lifetime `'mir` defined here
   |      |
   |      lifetime `'a` defined here
...
75 |         if self.snapshots.insert(EvalSnapshot::new(memory, stack)) {
   |                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ argument requires that `'mir` must outlive `'a`

error: unsatisfied lifetime constraints
   --> src/librustc_mir/interpret/snapshot.rs:410:1
    |
410 |   impl_stable_hash_for!(impl<'tcx, 'b, 'mir> for struct EvalSnapshot<'b, 'mir, 'tcx> {
    |   ^                          ----      ---- lifetime `'mir` defined here
    |   |                          |
    |  _|                          lifetime `'tcx` defined here
    | |
411 | |     // Not hashing memory: Avoid hashing memory all the time during execution
412 | |     memory -> _,
413 | |     stack,
414 | | });
    | |___^ type annotation requires that `'mir` must outlive `'tcx`
    |
    = note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)

Is there some magic in the stable hash macro that might affect this?

…k frame Usually the layout of any locals is required at least three times, once when it becomes live, once when it is written to, and once it is read from. By adding a cache for them, we can reduce the number of layout queries speeding up code that is heavy on const_eval.

dotdash · 2019-01-20T18:10:27Z

Seems that the constraint on the stable hash impl was just swapped around, asking for 'mir to outlive 'tcx instead of the other way around.

dotdash · 2019-01-20T18:11:28Z

@bors try

We should measure this again, my local setup seems a bit funky right now, and showed a massive slowdown for this version which seems unlikely to be right.

bors · 2019-01-20T18:11:39Z

⌛ Trying commit 98d4f33 with merge a0b24f2d5d611007296d56748caf31cfa8e5af32...

bors · 2019-01-20T20:12:10Z

☀️ Test successful - checks-travis
State: approved= try=True

oli-obk · 2019-01-21T09:24:39Z

@rust-timer build a0b24f2d5d611007296d56748caf31cfa8e5af32

rust-timer · 2019-01-21T09:24:40Z

Success: Queued a0b24f2d5d611007296d56748caf31cfa8e5af32 with parent 4db2394, comparison URL.

rust-timer · 2019-01-21T12:36:18Z

Finished benchmarking try commit a0b24f2d5d611007296d56748caf31cfa8e5af32

oli-obk · 2019-01-21T12:46:59Z

Perf is great (though no effect on unicode-normalization). Impl looks good.

The commit message is still slightly outdated.

dotdash · 2019-01-21T13:27:47Z

Ah, I had updated the commit message, but not the message for the PR itself. Or did I still miss something in the commit message as well?

michaelwoerister · 2019-01-21T14:56:51Z

@bors r+

Thanks, @dotdash!

bors · 2019-01-21T14:56:52Z

📌 Commit 98d4f33 has been approved by michaelwoerister

const_eval: Predetermine the layout of all locals when pushing a stack frame Usually the layout of any locals is required at least three times, once when it becomes live, once when it is written to, and once it is read from. By adding a cache for them, we can reduce the number of layout queries speeding up code that is heavy on const_eval.

@ghost

Rollup of 9 pull requests Successful merges: - #57537 (Small perf improvement for fmt) - #57552 (Default images) - #57604 (Make `str` indexing generic on `SliceIndex`.) - #57667 (Fix memory leak in P::filter_map) - #57677 (const_eval: Predetermine the layout of all locals when pushing a stack frame) - #57791 (Add regression test for #54582) - #57798 (Corrected spelling inconsistency) - #57809 (Add powerpc64-unknown-freebsd) - #57813 (fix validation range printing when encountering undef) Failed merges: r? @ghost

RalfJung · 2019-01-30T08:18:21Z

src/librustc_mir/interpret/operand.rs

@@ -510,7 +504,7 @@ impl<'a, 'mir, 'tcx, M: Machine<'a, 'mir, 'tcx>> EvalContext<'a, 'mir, 'tcx, M>
            // FIXME: do some more logic on `move` to invalidate the old location
            Copy(ref place) |
            Move(ref place) =>
-                self.eval_place_to_op(place, layout)?,
+                self.eval_place_to_op(place)?,


Hm, but this means we might have to compute the layout even if we already know it, right?

To change this we'd need to pass a precomputed layout to layout_of_local, which probably doesn't do anything, because any initialized local will already have had called layout_of_local on it before. We can still benchmark it and see if it changes anything

eddyb · 2019-01-30T09:15:37Z

src/librustc_mir/const_eval.rs

@@ -72,6 +72,7 @@ fn mk_eval_cx_inner<'a, 'mir, 'tcx>(
    ecx.stack.push(interpret::Frame {
        block: mir::START_BLOCK,
        locals: IndexVec::new(),
+        local_layouts: IndexVec::new(),


Can't the information be included in locals itself?

rust-highfive assigned michaelwoerister Jan 16, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 16, 2019

oli-obk reviewed Jan 18, 2019

View reviewed changes

dotdash force-pushed the locals branch from fafeda6 to 98d4f33 Compare January 20, 2019 18:09

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 21, 2019

Centril mentioned this pull request Jan 21, 2019

Rollup of 8 pull requests #57823

Closed

Centril mentioned this pull request Jan 22, 2019

Rollup of 9 pull requests #57830

Merged

bors merged commit 98d4f33 into rust-lang:master Jan 22, 2019

RalfJung reviewed Jan 30, 2019

View reviewed changes

eddyb reviewed Jan 30, 2019

View reviewed changes

dotdash deleted the locals branch January 20, 2020 08:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

const_eval: Predetermine the layout of all locals when pushing a stack frame #57677

const_eval: Predetermine the layout of all locals when pushing a stack frame #57677

dotdash commented Jan 16, 2019 •

edited

Loading

rust-highfive commented Jan 16, 2019

dotdash commented Jan 16, 2019

bors commented Jan 16, 2019

michaelwoerister commented Jan 17, 2019

michaelwoerister commented Jan 17, 2019

rust-timer commented Jan 17, 2019

rust-timer commented Jan 17, 2019

michaelwoerister commented Jan 17, 2019

dotdash commented Jan 17, 2019

oli-obk Jan 18, 2019

oli-obk Jan 18, 2019

michaelwoerister commented Jan 18, 2019

oli-obk commented Jan 18, 2019

michaelwoerister commented Jan 18, 2019

oli-obk commented Jan 18, 2019

michaelwoerister commented Jan 18, 2019

dotdash commented Jan 19, 2019

dotdash commented Jan 20, 2019

dotdash commented Jan 20, 2019

bors commented Jan 20, 2019

bors commented Jan 20, 2019

oli-obk commented Jan 21, 2019

rust-timer commented Jan 21, 2019

rust-timer commented Jan 21, 2019

oli-obk commented Jan 21, 2019

dotdash commented Jan 21, 2019

michaelwoerister commented Jan 21, 2019

bors commented Jan 21, 2019

RalfJung Jan 30, 2019

oli-obk Jan 30, 2019

eddyb Jan 30, 2019

		@@ -463,12 +463,10 @@ impl<'a, 'mir, 'tcx, M: Machine<'a, 'mir, 'tcx>> EvalContext<'a, 'mir, 'tcx, M>
		&self,

const_eval: Predetermine the layout of all locals when pushing a stack frame #57677

const_eval: Predetermine the layout of all locals when pushing a stack frame #57677

Conversation

dotdash commented Jan 16, 2019 • edited Loading

rust-highfive commented Jan 16, 2019

dotdash commented Jan 16, 2019

bors commented Jan 16, 2019

michaelwoerister commented Jan 17, 2019

michaelwoerister commented Jan 17, 2019

rust-timer commented Jan 17, 2019

rust-timer commented Jan 17, 2019

michaelwoerister commented Jan 17, 2019

dotdash commented Jan 17, 2019

oli-obk Jan 18, 2019

Choose a reason for hiding this comment

oli-obk Jan 18, 2019

Choose a reason for hiding this comment

michaelwoerister commented Jan 18, 2019

oli-obk commented Jan 18, 2019

michaelwoerister commented Jan 18, 2019

oli-obk commented Jan 18, 2019

michaelwoerister commented Jan 18, 2019

dotdash commented Jan 19, 2019

dotdash commented Jan 20, 2019

dotdash commented Jan 20, 2019

bors commented Jan 20, 2019

bors commented Jan 20, 2019

oli-obk commented Jan 21, 2019

rust-timer commented Jan 21, 2019

rust-timer commented Jan 21, 2019

oli-obk commented Jan 21, 2019

dotdash commented Jan 21, 2019

michaelwoerister commented Jan 21, 2019

bors commented Jan 21, 2019

RalfJung Jan 30, 2019

Choose a reason for hiding this comment

oli-obk Jan 30, 2019

Choose a reason for hiding this comment

eddyb Jan 30, 2019

Choose a reason for hiding this comment

dotdash commented Jan 16, 2019 •

edited

Loading