Improve LLVM optimization across function calls #21465

mahkoh · 2015-01-21T14:11:47Z

Vec<T> has an extend function which is equivalent to push_all except that it accepts an iterator. push_all optimizes to memcpy but extend does not. Consider the following unsafe Iterator:

struct MyIntoIter<T> {
    ptr: *const T,
    end: *const T
}

impl<T> MyIntoIter<T> {
    fn from_vec(mut v: Vec<T>) -> MyIntoIter<T> {
        unsafe {
            let ptr = v.as_mut_ptr();
            let begin = ptr as *const T;
            let end = if mem::size_of::<T>() == 0 {
                (ptr as usize + v.len()) as *const T
            } else {
                ptr.offset(v.len() as isize) as *const T
            };
            mem::forget(v);
            MyIntoIter { ptr: begin, end: end }
        }
    }
}

trait RawIterator {
    type Item;

    fn next(&mut self) -> Self::Item;
    fn ptr(&self) -> *const Self::Item;
    fn len(&self) -> usize;
}

impl<T> RawIterator for MyIntoIter<T> {
    type Item = T;

    fn next(&mut self) -> T {
        unsafe {
            let old = self.ptr;
            self.ptr = self.ptr.offset(1);
            ptr::read(old)
        }
    }

    fn ptr(&self) -> *const T {
        self.ptr
    }

    fn len(&self) -> usize {
        (self.end as usize - self.ptr as usize) / mem::size_of::<T>()
    }
}

This has all safety features removed. The user is responsible for not calling next when there are no more elements. One can assume that the normal Iterator will never optimize better than this Iterator.

Consider the following variants of extend:

#![allow(unstable)]

use std::{ptr, mem};

#[inline(never)]
fn extend4<T, I: RawIterator<Item=T>>(vec: &mut Vec<T>, mut iter: I) {
    let len = iter.len();
    vec.reserve(len);
    unsafe {
        let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
        let end = ptr.offset(len as isize);
        while ptr != end {
            ptr::write(ptr, iter.next());
            ptr = ptr.offset(1);
        }
    }
}

#[inline(never)]
fn extend5<T, I: RawIterator<Item=T>>(vec: &mut Vec<T>, mut iter: I) {
    let len = iter.len();
    vec.reserve(len);
    unsafe {
        let mut dst_ptr = vec.as_mut_ptr().offset(vec.len() as isize);
        let dst_end = dst_ptr.offset(len as isize);
        let mut src_ptr = iter.ptr();
        while dst_ptr != dst_end {
            ptr::write(dst_ptr, ptr::read(src_ptr));
            dst_ptr = dst_ptr.offset(1);
            src_ptr = src_ptr.offset(1);
        }
    }
}

fn main() {
    let src: Vec<_> = (0..6400000us).map(|x| x as u8).collect();
    let mut x: Vec<u8> = vec!();

    //extend4(&mut x, MyIntoIter::from_vec(src));
    extend5(&mut x, MyIntoIter::from_vec(src));
}

You can see that extend5 is nothing but extend4 with next inlined manually. However, extend5 optimizes to a memcpy while extend4 does not.

From this we can see that adding an unsafe ExactSizeIterator trait will not improve the extend performance on its own.

The text was updated successfully, but these errors were encountered:

mahkoh · 2015-01-21T14:15:37Z

On the other hand, extend4 does not optimize better than extend2:

#[inline(never)]
fn extend2<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, mut iter: I) {
    let len = iter.len();
    vec.reserve(len);
    unsafe {
        let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
        let end = ptr.offset(len as isize);
        while ptr != end {
            let el = iter.next();
            assume(el.is_some());
            ptr::write(ptr, el.unwrap());
            ptr = ptr.offset(1);
        }
    }
}

So one might assume that if one can get extend4 to optimize to a memcpy, then extend2 will also optimize to a memcpy.

Gankra · 2015-01-21T19:35:00Z

CC @Aatch

Aatch · 2015-01-21T20:57:37Z

It is possible to always inline methods with #[inline(always)]. While I generally discourage use of the attribute, it certainly has its uses for when LLVM seems reluctant to do it itself.

I might investigate what's going on here though. I assume this is all in the same crate?

mahkoh · 2015-01-21T21:04:01Z

@Aatch: Adding this attribute to the next function up there changes nothing.

I assume this is all in the same crate?

It's all one file.

mahkoh · 2015-01-21T21:08:25Z

Maybe related: #18363

Aatch · 2015-01-21T22:21:04Z

Ok, figured out the problem. It's one I already know about, which is nice I guess. The long and short of it though is that LLVM doesn't know about ownership. To demonstrate, this change to extend4 causes it to generate the same IR as extend5:

#[inline(never)]
fn extend4<T, I: RawIterator<Item=T>>(vec: &mut Vec<T>, iter: I) {
    let mut iter = iter; // Add this line
    let len = iter.len();
    vec.reserve(len);
    unsafe {
        let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
        let end = ptr.offset(len as isize);
        while ptr != end {
            ptr::write(ptr, iter.next());
            ptr = ptr.offset(1);
        }
    }
}

That is literally it. The problem is that LLVM doesn't realise that, in the caller, the changes won't be visible. The iterator is passed as a pointer, due to it being too big to be an immediate. Thus we're writing to memory that can be seen from outside the function.

Making a new variable copies the iterator into a local stack slot and LLVM suddenly gets much happier about optimising to much faster code. The ideal fix would be to teach LLVM a bit about our ownership semantics.

mahkoh · 2015-01-21T22:26:54Z

Ah, so the problem wasn't the function call but the place where the pointer was stored? Either way, with this change the extend2 function which just uses the proposed ExactSizeIterator trait does optimize to a memcpy:

#[inline(never)]
fn extend2<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, iter: I) {
    let mut iter = iter;
    let len = iter.len();
    vec.reserve(len);
    unsafe {
        let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
        let end = ptr.offset(len as isize);
        while ptr != end {
            let el = iter.next();
            assume(el.is_some());
            ptr::write(ptr, el.unwrap());
            ptr = ptr.offset(1);
        }
    }
}

Gankra · 2015-01-21T22:36:29Z

@mahkoh Is the while ptr != end style really necessary? Would a normal for not suffice (wouldn't be particularly surprised either way, just curious)?

mahkoh · 2015-01-21T22:38:22Z

Unfortunately

#[inline(never)]
fn extend1<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, iter: I) {
    let mut iter = iter;
    let len = iter.len();
    vec.reserve(len);
    for _ in (0..len) {
        let el = iter.next();
        unsafe {
            assume(el.is_some());
            assume(vec.len() < vec.capacity());
            vec.push(el.unwrap())
        }
    }
}

doesn't work.

mahkoh · 2015-01-21T22:41:39Z

Maybe assume(vec.len() < vec.capacity()); is just too abstract. As a Vec method it might be possible to write this differently and then it might work.

Gankra · 2015-01-21T22:43:18Z

This is how I would "ideally" write it (ignoring panic concerns, which is I guess why you went with push?):

fn extend1<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, iter: I) {
    let mut iter = iter;
    let len = iter.len();
    vec.reserve(len);
    let mut ptr = vec.as_mut_ptr().offset(vec.len());
    for el in iter {
       ptr.write(el);
       ptr = ptr.offset(1);
    }
    vec.set_len(len);
}

mahkoh · 2015-01-21T22:48:39Z

That does indeed work:

#[inline(never)]
fn extend7<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, mut iter: I) {
    let mut iter = iter;
    let len = iter.len();
    vec.reserve(len);
    unsafe {
        let mut ptr = vec.as_mut_ptr().offset(vec.len() as isize);
        for el in iter {
           ptr::write(ptr, el);
           ptr = ptr.offset(1);
        }
        vec.set_len(len);
    }
}

Gankra · 2015-01-21T23:00:16Z

Okay that's great. Just need to add an O(1) unwind guard to set the len and we're golddeeen... maybeeee..?

Ah zero-sized-types complicate this all, but it's easy enough to just have another branch to cover that.

mahkoh · 2015-01-21T23:39:21Z

#[inline(never)]
fn extend7<T, I: Iterator<Item=T>+ExactSizeIterator>(vec: &mut Vec<T>, iter: I) {
    struct PanicGuard<'a, T: 'a> {
        vec: &'a mut Vec<T>,
        ptr: *const *mut T,
    }

    #[unsafe_destructor]
    impl<'a, T> Drop for PanicGuard<'a, T> {
        fn drop(&mut self) {
            unsafe {
                let diff = *self.ptr as usize - self.vec.as_ptr() as usize;
                let size = mem::size_of::<T>();
                let len = diff / if size == 0 { 1 } else { size };
                self.vec.set_len(len);
            }
        }
    }

    let mut iter = iter;
    let len = iter.len();
    vec.reserve(len);

    unsafe {
        {
            let mut ptr = if mem::size_of::<T>() == 0 {
                (vec.as_mut_ptr() as usize + vec.len()) as *mut T
            } else {
                vec.as_mut_ptr().offset(vec.len() as isize)
            };
            let guard = PanicGuard { vec: vec, ptr: &ptr };
            for el in iter {
               ptr::write(ptr, el);
               ptr = if mem::size_of::<T>() == 0 {
                   (ptr as usize + 1) as *mut T
               } else {
                   ptr.offset(1)
               };
            }
            mem::forget(guard);
        }
        vec.set_len(len);
    }
}

This works but only for primitive types.

mahkoh · 2015-01-21T23:41:35Z

LLVM will not optimize this for structs even if the struct contains a single primitive field.

Gankra added I-slow Issue: Problems and improvements with respect to performance of generated code. A-codegen Area: Code generation labels Jan 21, 2015

sanxiyn added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jan 25, 2015

Gankra mentioned this issue Feb 11, 2015

Add back Vec::from_elem rust-lang/rfcs#832

Merged

mahkoh closed this as completed Apr 11, 2015

rust-lang locked and limited conversation to collaborators Apr 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve LLVM optimization across function calls #21465

Improve LLVM optimization across function calls #21465

mahkoh commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Gankra commented Jan 21, 2015

Aatch commented Jan 21, 2015

mahkoh commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Aatch commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Gankra commented Jan 21, 2015

mahkoh commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Gankra commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Gankra commented Jan 21, 2015

mahkoh commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Improve LLVM optimization across function calls #21465

Improve LLVM optimization across function calls #21465

Comments

mahkoh commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Gankra commented Jan 21, 2015

Aatch commented Jan 21, 2015

mahkoh commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Aatch commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Gankra commented Jan 21, 2015

mahkoh commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Gankra commented Jan 21, 2015

mahkoh commented Jan 21, 2015

Gankra commented Jan 21, 2015

mahkoh commented Jan 21, 2015

mahkoh commented Jan 21, 2015