Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add get and get_unchecked to *const [T] and *mut [T] #60639

Closed
oli-obk opened this issue May 8, 2019 · 23 comments · Fixed by #73986
Closed

Add get and get_unchecked to *const [T] and *mut [T] #60639

oli-obk opened this issue May 8, 2019 · 23 comments · Fixed by #73986
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@oli-obk
Copy link
Contributor

oli-obk commented May 8, 2019

If one has a *const [T] and wants a pointer to a specific element of that slice, one needs to go through a reference and index, or cast the pointer to *const T and offset that by the number of elements (which doesn't do a length check, so you need that, too). I propose that we add two new methods to *const [T] and *mut [T]:

impl<T> *const [T] {
    fn get(self, idx: usize) -> Option<*const T>;
    unsafe fn get_unchecked(self, idx: usize) -> *const T;
} 

get_unchecked is unsafe, because if a large index is used, one may overflow the pointer, which is UB

To make the implementation of get simpler, I propose to additionally add a len method:

impl<T> *const [T] {
    fn len(self) -> usize;
}

cc @RalfJung @shepmaster

@oli-obk oli-obk added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-feature-request Category: A feature request, i.e: not implemented / a PR. labels May 8, 2019
@shepmaster
Copy link
Member

shepmaster commented May 8, 2019

My specific case would have been for *mut [T] — is that deliberately omitted?

@oli-obk
Copy link
Contributor Author

oli-obk commented May 8, 2019

no, I'm just lazy and didn't want to type everything twice. There should be an equivalent of all the above for *mut [T]

@shepmaster shepmaster changed the title add get and get_unchecked to *const [T] Add get and get_unchecked to *const [T] and *mut [T] May 8, 2019
@Stargateur
Copy link
Contributor

Stargateur commented May 8, 2019

Pardon me if It's stupid but why not just let the user go from *const [T] to &[T] ?

fn main() {
    let x_ptr = &[1, 2, 4] as *const [_];
    
    unsafe {
        let x = &*x_ptr;
        println!("{:?}", x.get(1).map(|x| x as *const _));
        println!("{:?}", x.get_unchecked(1) as *const _);
    }
}

Is that really to long to write ? Also, want to ask, when do you need a *const [T] ?

@shepmaster
Copy link
Member

The original context was this code:

use std::collections::BTreeSet;

fn uniq_refs<'i, 'd: 'i, T>(
    data: &'d mut [T],
    indices: &'i BTreeSet<usize>,
) -> impl Iterator<Item = &'d mut T> + 'i {
    let in_bounds_indices = indices.range(0..data.len());

    in_bounds_indices.map(move |&i| {
        // I copied this from a Stack Overflow answer 
        // without reading the text that explains why this is safe
        unsafe {
            let r = data.get_unchecked_mut(i);
            let p: *mut T = r;
            &mut *p
        }
    })
}

Miri detects that this usage triggers undefined behavior:

use std::iter::FromIterator;

fn main() {
    let mut scores = vec![1, 2, 3];
    let indices = BTreeSet::from_iter(vec![0, 2]);

    let selected_scores: Vec<_> = uniq_refs(&mut scores, &indices).collect();
    for score in selected_scores {
        *score += 1;
    }
}

I wanted to make use of get_unchecked_mut, but that function requires unique access to data. That's invalid here because there's an outstanding mutable reference to a T from inside the slice. The non-UB form of this involves getting a raw pointer to the data and performing offsets:

fn uniq_refs<'i, 'd: 'i, T>(
    data: &'d mut [T],
    indices: &'i BTreeSet<usize>,
) -> impl Iterator<Item = &'d mut T> + 'i {
    let start = data.as_mut_ptr();
    let in_bounds_indices = indices.range(0..data.len());
    in_bounds_indices.map(move |&i| unsafe { &mut *start.add(i) })
}

With the proposed functions, that code could look like

fn uniq_refs<'i, 'd: 'i, T>(
    data: &'d mut [T],
    indices: &'i BTreeSet<usize>,
) -> impl Iterator<Item = &'d mut T> + 'i {
    let start = data as *mut [T];
    let in_bounds_indices = indices.range(0..data.len());
    in_bounds_indices.map(move |&i| unsafe { &mut *data.get_unchecked(i) })
}

@RalfJung
Copy link
Member

RalfJung commented May 9, 2019

Yeah, this seems reasonable. However, I think it should be seen in the context of our general need for a better story for raw pointers.

@scottmcm
Copy link
Member

impl<T> *const [T] {
    fn get(self, idx: usize) -> Option<*const T>;
}

Wouldn't that need to be unsafe for the same reason as offset?

@oli-obk
Copy link
Contributor Author

oli-obk commented May 12, 2019

Since it would be bounds-checked, you can't overflow, so get should be safe

@oli-obk
Copy link
Contributor Author

oli-obk commented May 12, 2019

Ah no, you're right. One can create a wrong slice that would overflow for some indices...

@RalfJung
Copy link
Member

It could use wrapping_offset, though. Then it could be safe.

@Stargateur
Copy link
Contributor

Stargateur commented May 12, 2019

just to note that in C, it's UB to create a out of bound pointer of more than len, for exemple:

char *ptr = malloc(42); // assert no null
char *ub = ptr + 43;
char *also_ub = ptr - 1;

Look like Rust allow this, but use raw pointer is often use with C binding, I just through this could be useful to know.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 12, 2019

It could use wrapping_offset, though. Then it could be safe.

But then people would have a performance incentive to not use the clearer and more convenient method. And what for? If one uses this method they'll almost always go on to access memory through the resulting pointer, and then they'd almost certainly have UB anyway if the index is too large or goes otherwise out of bounds. While wrapping_offset does allow "allocation crossing" with a suitably crafted offset, this is extremely rare and so I don't see why we should prioritize it in a convenience method.

@RalfJung
Copy link
Member

RalfJung commented May 12, 2019

But then people would have a performance incentive to not use the clearer and more convenient method.

If they want the performance, wouldn't they use get_unchecked anyway? Seems like if you care about whatever LLVM gets from knowing there is no overflow, you'll also want no bounds check.

Put differently, what is even the point of having get if it is not safe?

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented May 12, 2019

Put differently, what is even the point of having get if it is not safe?

This is a good question. I feel like the answer might well be "there is no point to having get". Is there anything useful one can do with get that doesn't already require knowing the preconditions of get_unchecked are met? If the resulting pointer is not good to dereference (e.g., because the *[T] was dangling) one could still compare the resulting pointer safely, but is that ever useful by itself? (The aforementioned jumping between allocations in ways that still result in dereferenceable pointer is too niche to justify the existence of the method.) (edit: I just realized: the pointer wouldn't be derefenceable when crossing allocations, so this isn't even a different case.)

I can see some potential value in how an unsafe get could simplify proof obligations, though. Assume get does bounds checking against the claimed length of the slice, but uses offset if the bounds check passes and hence has UB if the ptr/length pair is "wrong" (e.g., dangling pointer, excessive length). Using this method soundly only requires only knowing that the raw slice pointer you have is "correct", you don't have to worry at all about the values you use as indices because all your accesses are bounds-checked against the length of the slice. A complete characterization of when the slice is "correct" for this purpose could get complicated, but many simple common cases are obviously fine, e.g.:

  • if you got the pointer from a safe slice reference (that's not been invalidated since)
  • if you called alloc(Layout::array<T>(n)) and turned the resulting pointer into a slice of length n (and haven't deallocated the memory since then)

I believe this distinction between trusting the slice and trusting the indices also answers this question:

If they want the performance, wouldn't they use get_unchecked anyway?

They might need the bounds-check anyway (because the indices are untrusted) and then they might as well leave that to the standard library rather than open-coding it.

@gnzlbg
Copy link
Contributor

gnzlbg commented May 23, 2019

To make the implementation of get simpler, I propose to additionally add a len method:

For that len method to be safe, it would need to panic if *mut [T] is null, and it would either need to panic if the pointer is not properly aligned, or do an unaligned ptr read of the len.

@RalfJung
Copy link
Member

RalfJung commented May 23, 2019

@gnzlbg why that? A *mut [T] is a (*mut T, usize), and to read the usize we don't care if the pointer is NULL or unaligned. And we can assume that this pair itself is well-aligned (we are taking it by-value, anyway).

@gnzlbg
Copy link
Contributor

gnzlbg commented May 23, 2019

And we can assume that this pair itself is well-aligned

I was missing this, makes sense.

to read the usize we don't care if the pointer is NULL

So does this mean that if the pointer is NULL, the len must be zero, or else the pointer is invalid ?

@RalfJung
Copy link
Member

It's a raw pointer, I don't think we are making any such requirements. Both pointer and integer can be just about anything that a *mut T or usize can be.

@RalfJung
Copy link
Member

So, in terms of implementing this as an unstable feature... the main problem is that we want these to be inherent methods on *const [T], right? AFAIK this needs some rustc magic and lang items as those types are not defined in any crate, and thus cannot usually get inherent methods?

@oli-obk
Copy link
Contributor Author

oli-obk commented Mar 25, 2020

Yea, we need to add the same kind of magic that the integer types have in

rust/src/libcore/num/mod.rs

Lines 2380 to 2381 in 1572c43

#[lang = "i8"]
impl i8 {

Though in this case it may be more difficult due to the involved generics, so you should look at the implementation of the magic for slices for inspiration:

#[lang = "slice"]
#[cfg(not(test))]
impl<T> [T] {

@RalfJung
Copy link
Member

#71082 does all the hard parts here -- now adding more raw slice methods (like indexing) is a libcore-only patch, rustc does not need any further changes.

@RalfJung
Copy link
Member

RalfJung commented Jul 5, 2020

#73986 should resolve this.

Manishearth added a commit to Manishearth/rust that referenced this issue Jul 13, 2020
add (unchecked) indexing methods to raw (and NonNull) slices

This complements the existing (unstable) `len` method. Unfortunately, for non-null slices, we cannot call this method `as_ptr` as that overlaps with the existing method of the same name.

If this looks reasonable to accept, I propose to reuse the rust-lang#71146 tracking issue and rename the feature get to `slice_ptr_methods` or so.

Cc @SimonSapin
Fixes rust-lang#60639
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 13, 2020
add (unchecked) indexing methods to raw (and NonNull) slices

This complements the existing (unstable) `len` method. Unfortunately, for non-null slices, we cannot call this method `as_ptr` as that overlaps with the existing method of the same name.

If this looks reasonable to accept, I propose to reuse the rust-lang#71146 tracking issue and rename the feature get to `slice_ptr_methods` or so.

Cc @SimonSapin
Fixes rust-lang#60639
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 13, 2020
add (unchecked) indexing methods to raw (and NonNull) slices

This complements the existing (unstable) `len` method. Unfortunately, for non-null slices, we cannot call this method `as_ptr` as that overlaps with the existing method of the same name.

If this looks reasonable to accept, I propose to reuse the rust-lang#71146 tracking issue and rename the feature get to `slice_ptr_methods` or so.

Cc @SimonSapin
Fixes rust-lang#60639
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 13, 2020
add (unchecked) indexing methods to raw (and NonNull) slices

This complements the existing (unstable) `len` method. Unfortunately, for non-null slices, we cannot call this method `as_ptr` as that overlaps with the existing method of the same name.

If this looks reasonable to accept, I propose to reuse the rust-lang#71146 tracking issue and rename the feature get to `slice_ptr_methods` or so.

Cc @SimonSapin
Fixes rust-lang#60639
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 14, 2020
add (unchecked) indexing methods to raw (and NonNull) slices

This complements the existing (unstable) `len` method. Unfortunately, for non-null slices, we cannot call this method `as_ptr` as that overlaps with the existing method of the same name.

If this looks reasonable to accept, I propose to reuse the rust-lang#71146 tracking issue and rename the feature get to `slice_ptr_methods` or so.

Cc @SimonSapin
Fixes rust-lang#60639
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 14, 2020
add (unchecked) indexing methods to raw (and NonNull) slices

This complements the existing (unstable) `len` method. Unfortunately, for non-null slices, we cannot call this method `as_ptr` as that overlaps with the existing method of the same name.

If this looks reasonable to accept, I propose to reuse the rust-lang#71146 tracking issue and rename the feature get to `slice_ptr_methods` or so.

Cc @SimonSapin
Fixes rust-lang#60639
@bors bors closed this as completed in 79894df Jul 14, 2020
@WaffleLapkin
Copy link
Member

The checked version,

impl<T> *const [T] {
    fn get(self, idx: usize) -> Option<*const T>;
}

wasn't implemented. Why?

@RalfJung
Copy link
Member

As far as I am concerned, mostly due to lack of imagination -- I figured if you'd want checks you'd use references, not raw pointers. The return type Option<*const T> is also strange, note that this one cannot be enum-niche-optimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants