Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New lint: transmute(Vec) is undefined behavior #4484

Closed
Shnatsel opened this issue Sep 1, 2019 · 23 comments
Closed

New lint: transmute(Vec) is undefined behavior #4484

Shnatsel opened this issue Sep 1, 2019 · 23 comments
Labels
A-lint Area: New lints L-correctness Lint: Belongs in the correctness lint group

Comments

@Shnatsel
Copy link
Member

Shnatsel commented Sep 1, 2019

Documentation on std::mem::transmute() specifically calls out that transmuting a Vec of any type is UB.

Code with UB from documentation:

let store = [0, 1, 2, 3];
let mut v_orig = store.iter().collect::<Vec<&i32>>();

// Using transmute: this is Undefined Behavior, and a bad idea.
let v_transmuted = unsafe {
    std::mem::transmute::<Vec<&i32>, Vec<Option<&i32>>>(v_orig)
};

Correct code from documentation (assuming interior types can be transmuted):

let store = [0, 1, 2, 3];
let mut v_orig = store.iter().collect::<Vec<&i32>>();
// Using from_raw_parts
let v_from_raw = unsafe {
    Vec::from_raw_parts(v_orig.as_mut_ptr() as *mut Option<&i32>,
                        v_orig.len(),
                        v_orig.capacity())
};
std::mem::forget(v_orig);

What exactly is wrong with it is not currently documented (I've opened rust-lang/rust#64073 to clarify this) but my understanding is that the problem stems from the following:

Most fundamentally, Vec is and always will be a (pointer, capacity, length) triplet. No more, no less. The order of these fields is completely unspecified, and you should use the appropriate methods to modify these.

I.e. while it's safe to transmute the data on the heap assuming the types are compatible, it's never safe to transmute the (pointer, capacity, length) triplet on the stack.

I'm not 100% convinced that the "correct" code proposed in documentation correctly upholds aliasing invariants. It's possible that std::mem::forget() must be called before the new Vec is constructed, or that it's only safe to do with ManuallyDrop.

cc @RalfJung

@danielhenrymantilla
Copy link

danielhenrymantilla commented Sep 1, 2019

It's possible that std::mem::forget() must be called before the new Vec is constructed, or that it's only safe to do with ManuallyDrop.

Yes, the potential issue is that

    Vec::from_raw_parts(v_orig.as_mut_ptr() as *mut Option<&i32>,
                        v_orig.len(),
                        v_orig.capacity())

asserts / requires that v_org.as_mut_ptr() not be aliased (due to Vec::from_raw_parts creating (unique) ownership out of that pointer)

However, the original pointer it comes from, the one from v_orig, is "used" right afterwards in mem::forget. Unless forget becomes a special lang construct, the current stacked borrows model fails with this pattern.

The solution is to "forget" in advance, thanks to ManuallyDrop:

let store = [0, 1, 2, 3];
let v_orig = store.iter().collect::<Vec<&i32>>();
// Using from_raw_parts
let v_from_raw = unsafe {
    // prepare for an auto-forget of the initial vec:
    let v_orig: &mut Vec<_> = &mut *ManuallyDrop::new(v_orig);
    Vec::from_raw_parts(v_orig.as_mut_ptr() as *mut Option<&i32>,
                        v_orig.len(),
                        v_orig.capacity())
    // v_orig is never used again, so no aliasing issue
};

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2019

That's a lot of code. Perhaps it might be a good idea to encapsulate it in a Vec::transmute::<Item>()-method?

@Lokathor
Copy link

Lokathor commented Sep 2, 2019

There is currently no trait for this in core or std, so crates have to make up their own.

@Shnatsel
Copy link
Member Author

Shnatsel commented Sep 2, 2019

@llogiq sounds like a good idea to me. Care to get the ball rolling with an issue against stdlib and/or an RFC?

@flip1995 flip1995 added L-correctness Lint: Belongs in the correctness lint group A-lint Area: New lints labels Sep 2, 2019
@Lokathor
Copy link

Lokathor commented Sep 2, 2019

This actually could potentially be a lot more broad and useful, though I don't know how much of this info that clippy really has access to when doing an analysis:

  • Transmuting any repr(Rust) type with more than one field is UB.

Related:

  • Transmuting any Box<H> to Box<K> is UB when you drop the new Box unless the new type's size and alignment are both exactly the same. This includes anything that transitively contains a Box

@llogiq
Copy link
Contributor

llogiq commented Sep 2, 2019

@Shnatsel Sure thing. I'll write up that RFC.

@llogiq
Copy link
Contributor

llogiq commented Sep 3, 2019

rust-lang/rfcs#2756

@CAD97
Copy link
Contributor

CAD97 commented Sep 8, 2019

As a follow-up to @Lokathor: currently, a lint would be always correct to trigger when both type arguments to transmute are repr(Rust), unless they are fully equivalent. Any defined transmutes must involve at least one repr(C) or repr(transparent) type (counting built-in types with defined layouts as repr(C)).

A lint specifically for Vec and/or other smart pointers is a great opportunity for teaching, but until we get more defined layouts, it's UB for any two distinct repr(Rust) types even when they're identical newtypes.

@RalfJung
Copy link
Member

a lint would be always correct to trigger when both type arguments to transmute are repr(Rust)

I think that's basically rust-lang/rust#50842.

I'm not 100% convinced that the "correct" code proposed in documentation correctly upholds aliasing invariants. It's possible that std::mem::forget() must be called before the new Vec is constructed, or that it's only safe to do with ManuallyDrop.

@llogiq has since corrected that in the docs.

@llogiq
Copy link
Contributor

llogiq commented Oct 20, 2019

#4592 should have closed this.

@llogiq llogiq closed this as completed Oct 20, 2019
@shepmaster

This comment has been minimized.

@Lokathor
Copy link

Correct.

Because Vec is not transparent, so it's UB.

If needed, you could convert a slice of Outer to slice of Inner via core::slice::from_raw_parts and pointer casting and all that, but not with Vec.

@RReverser
Copy link

While I understand why Vec<&T> -> Vec<Option<&T>> is unsafe and UB, it's unclear why this lint applies to any vector transformations.

For example, the most common usecase for transmute for me is when a newtype wraps another one with #[repr(transparent):

#[repr(transparent)]
struct NewType<T>(T);

Isn't in this case transmuting Vec<NewType<T>> <=> Vec<T> safe because their alignments, pointer representations and everything else are guaranteed to match?

@shepmaster
Copy link
Member

shepmaster commented Mar 17, 2020

No, because the implementation of Vec itself is non-transmutable because it's #[repr(Rust)].

Pretend that Vec<T> is effectively (*mut T, len, cap). There's nothing that prevents Vec<U> from being (cap, *mut U, len), or any other permutation.

Said another way:

Because Vec is not transparent, so it's UB.

See also #4484 (comment)

@Lokathor
Copy link

Nope.

Because Vec itself is repr(Rust), all transmutation between different monomorphizations is UB.

Now, sure, the data layout is likely to be the same and so it is likely to work despite being UB, but that's how a lot of UB is until one day it breaks.

@Lokathor
Copy link

sniped :(

@RReverser
Copy link

@shepmaster Hmm. Is this something specific to Vec because it's built-in, or to any repr(Rust)?

Generally I'd expect #[repr(transparent)] to guarantee that when it's put inside arbitrary structure or an enum, the layout of that structure or enum would be exactly the same as if the type was unwrapped.

If that's not the case, that significantly reduces usefulness of #[repr(transparent)] for FFI code, since it means only direct usage of the type is guaranteed to be transparent.

@RReverser
Copy link

RReverser commented Mar 17, 2020

UPD: Oh I see, it's worse than I thought - even structs without newtypes are not guaranteed to be deterministic :/ https://rust-lang.github.io/unsafe-code-guidelines/layout/structs-and-tuples.html#unresolved-question-guaranteeing-compatible-layouts

That's pretty bad... I don't want to switch all my types to repr(C) as repr(Rust) provides lots of valuable size optimisations, yet I do want to use #[repr(transparent)] nested inside other types without worrying about layout incompatibilities...

@RalfJung
Copy link
Member

provides lots of valuable size optimisations

These optimizations are possible only because the layout is not guaranteed. You have the choice between optimized layout and predictable layout. Having both at the same time is just impossible.

@RReverser
Copy link

RReverser commented Mar 17, 2020

These optimizations are possible only because the layout is not guaranteed.

Sure, but there is (usually) a difference between "layout is not guaranteed" as in "it's unspecified, so don't rely on exact representation" and "layout is not deterministic" as in "even structurally equivalent types are not guaranteed to have compatible layout".

Most of these optimisations require only the former, and as a user I'm perfectly fine with not relying on exact memory representation, but I'd still like to rely on compatibility between types, as that's pretty much the only way to write efficient code in many cases.

@Lokathor
Copy link

It could potentially be deterministic without being fixed for all time.

However, there's a lot of work to do there, so it's UB until that happens, if that happens at all.

@llogiq
Copy link
Contributor

llogiq commented Mar 18, 2020

People are experimenting with safe transmutation methods as we speak. In general, I believe a type should be able to state what types it can be transmuted into and how (though to do this ergonomically might require us to extend the type system, e.g. to have #[repr(transparent)] auto-implement some marker trait).

@Lokathor
Copy link

Indeed! Beyond the safe-transmute project there's also various crates that are feeding ideas into the project that are already all capable of being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: New lints L-correctness Lint: Belongs in the correctness lint group
Projects
None yet
Development

No branches or pull requests

9 participants