Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dereferencing raw pointer to invalid data? #216

Closed
RalfJung opened this issue Oct 21, 2019 · 17 comments
Closed

Dereferencing raw pointer to invalid data? #216

RalfJung opened this issue Oct 21, 2019 · 17 comments
Labels
A-validity Topic: Related to validity invariants

Comments

@RalfJung
Copy link
Member

RalfJung commented Oct 21, 2019

When I dereference a raw pointer but do not actually load any data, such as in &raw const *x, is there any validity requirement on the data it points to?

According to the reference, the answer is "no". And indeed if the answer is "yes", it becomes impossible I think to do field-by-field initialization of a struct even with raw pointers. So "no" likely is the only possible answer.

This would mean that given a well-aligned, non-dangling x: *const !, doing &raw const *x is okay. Also, given a well-aligned, non-dangling x: *const (i32, !), doing &raw const (*x).0 is okay. In the second case, whether the resulting ptr is actually usable relates to the layout of structs with uninhabited fields -- I recall a discussion with (I think) @rkruppe about this? There were some proposals on the table to make such structs zero-sized, but they caused problems and IIRC the conclusion was that that's probably not possible.

To be sure about all of this, we should find a way to get that officially blessed by the lang team. Is there some reasonable place in the rustc repo or so where we could document this, and then FCP? Or do we need an RFC?

@RalfJung RalfJung added C-active-discussion-topic A-validity Topic: Related to validity invariants labels Oct 21, 2019
@Centril
Copy link
Contributor

Centril commented Oct 21, 2019

In the second case, whether the resulting ptr is actually usable relates to the layout of structs with uninhabited fields -- I recall a discussion with (I think) @rkruppe about this? There were some proposals on the table to make such structs zero-sized, but they caused problems and IIRC the conclusion was that that's probably not possible.

But to be explicit here, if we declare that &raw const (*x).0 is OK, that does not have immediate impact on the zero-sizedness / uninhabitedness (from the POV of operational semantics, not type / match checking) of *const (i32, !), right?

Or do we need an RFC?

I would want a brief RFC mostly for its documentation and education value.

@RalfJung
Copy link
Member Author

But to be explicit here, if we declare that &raw const (*x).0 is OK, that does not have immediate impact on the zero-sizedness / uninhabitedness (from the POV of operational semantics, not type / match checking) of *const (i32, !), right?

Not immediately, no. Though it makes it strikingly clear that for unsafe code, making them zero-sized is a big hazard: given a well-aligned non-dangling *const (i32, T), we couldn't even reliably get a raw pointer to the first field and write to it to initialize it. There might be no such field if T is uninhabited. That was @rkruppe's (I think) point the last time we discussed making uninhabited structs zero-sized.

@hanna-kruppe
Copy link

See also discussion in #79

My conclusion from past discussions is that making aggregates ZSTs if any components is uninhabited is in conflict with permitting piecewise initialization of aggregates, no matter whether it occurs in:

  • safe Rust code (currently not allowed, but has been in the past and would be silly to disallow forever)
  • unsafe Rust code (using raw pointers, as @RalfJung described)
  • in MIR (after optimizations that remove temporaries in favor of direct moves to the eventual destination)

There have been a couple proposals to rescue at least the third use case (and possibly the first) but they significantly complicate the semantics of MIR both for miri and for lowering to LLVM IR, and they can't do anything for unsafe use case.

@RalfJung
Copy link
Member Author

safe Rust code (currently not allowed, but has been in the past and would be silly to disallow forever)

Actually even in safe Rust we can do things like

let x: (i32, !) = (13, return);

@RalfJung
Copy link
Member Author

IMO the best way forward for the struct case is to revive @Centril's RFC that fixes partial initialization in safe code, so that this works:

let x: (i32, i32);
x.0 = 2;
x.1 = 4;
foo(&x);

Given that making ZST-uninhabited-structs work makes unsafe code handling partial initialization messy, makes MIR semantics and optimizations messy, makes the data layout spec messy -- I think we should fully commit to not doing that and then ripe the benefits of that, which is exactly what this RFC does. Or am I hallucinating that @Centril wrote such an RFC once?

@Ixrec
Copy link
Contributor

Ixrec commented Oct 21, 2019

You are not hallucinating, although it's still got a few TODOs in it: Centril/rfcs#16

@Centril
Copy link
Contributor

Centril commented Oct 21, 2019

Or am I hallucinating that @Centril wrote such an RFC once?

Also see rust-lang/rust#54987 and rust-lang/rust#63230 -- cc @tmandry and @Aaron1011.

(Just a note re. process: reviving my RFC probably requires a bunch of work wrt. finishing / polishing the text, building the consensus, and doing the implementation -- I'm not sure we should open that avenue now since it's not a roadmap goal and there are other existing opened RFCs to consider.)

@RalfJung
Copy link
Member Author

@Centril sure. It doesn't specifically have to be that RFC, just any one that commits us to partial initialization. Either way we can then have a subsection in that RFC that says that by accepting the RFC we also commit to structs always being laid out as "all of their fields in some order with maybe some gaps between them".

@HeroicKatora
Copy link

HeroicKatora commented Dec 21, 2019

Stable Rust already permits something that is effectively dereferencing a raw pointer without reading nor creating a reference to the place behind it (in unsafe code of course). The RHS of pattern-let is a pattern, not a value, and the pattern will not read the unnamed binding. This allows the refer to name the place (containing a potentially invalid value) without creating any references or data reads, then have the LHS throw away the statement without doing either as well.

let _ = *ptr_to_invalid_data;

@moxian
Copy link

moxian commented Dec 28, 2019

(I've randomly stumbled across this issue, and am not well versed in rust internals to be confident that what I'm saying makes sense. Please ignore me if this comment is more distracting than useful)

This would mean that given a well-aligned, non-dangling x: *const !, doing &raw const *x is okay.
Though it makes it strikingly clear that for unsafe code, making them zero-sized is a big hazard: given a well-aligned non-dangling *const (i32, T), we couldn't even reliably get a raw pointer to the first field and write to it to initialize it.

It appears to me, that this can be trivially sidestepped by saying that (i32, !) (or any composite type over !) is never well-aligned (by giving ! impossible layout constraints.. somehow, I'm not sure of the details).
This effectively makes *const(i32, !) and *const ! uninhabited, though, and that ship has already sailed... I'd be curious to know why *const ! was not declared uninhabited from the start (I couldn't find any info on that), but that's severely offtopic, I admit.

And indeed if the answer is "yes", it becomes impossible I think to do field-by-field initialization of a struct even with raw pointers

The following is currently not a valid syntax, but I don't see any obvious issues with making it valid (with an extra RFC, of course). Am I missing something?

struct S { a: i32, b: u64 };
fn init(s: *mut S){
  unsafe{ 
    *s.a = 5;
    *s.b = 8;
  }
}

(This would also - accidentally - improve ergonomics a bit by eliminating braces) This wouldn't be safe, since it would drop the old uninitialized data, nevermind.

@Aaron1011
Copy link
Member

Aaron1011 commented Dec 28, 2019

I'd be curious to know why *const ! was not declared uninhabited from the start (I couldn't find any info on that), but that's severely offtopic, I admit.

Raw pointers don't impose any constraints whatsoever on their contents (ignoring uninitialized data for the moment). Their validity invariant is 'anything goes', if I understand the terminology correctly.

In particular, you can freely cast a raw pointer to and from a usize, without needing to care about whether or not the pointee is inhabited. If *const ! was uninhabited, casting a usize to a *const ! would no longer be possible.

@moxian
Copy link

moxian commented Dec 28, 2019

No, right, I understand that this is how it currently works.
My question is why is it like that (other than legacy reasons/historical accident)? I can't come up with a situation where casting usize (or any pointer) to *const ! would be useful, yet disallowing such casts would resolve the problem of "Is it always safe to write to first element of *mut (u32, T)?" (with the answer of "yes, it is always safe because *mut (u32, !) does not exist at runtime")

Edit: I think I get it now. Even though the following is not super useful, it also would be very weird to disallow:

fn null_t_address<T>() -> usize {
  let x: *const T = core::ptr::null();
  return x as usize;
}
fn get_zero_in_a_roundabout_way() -> usize{
  null_t_address::<!>()
}

One could consider introducing T: Inhabited bounds (with Inhabited being implied by-default for generic parameters a-la Sized and needing an explicit ?Inhabited opt-out?) that would be required for pointer casts like that, but that's probably way in the overengineering territory.

Anyhow, thanks, consider my question answered, and apologies for offtopic!

@RalfJung
Copy link
Member Author

RalfJung commented Jan 2, 2020

disallowing such casts would resolve the problem of "Is it always safe to write to first element of *mut (u32, T)?" (with the answer of "yes, it is always safe because *mut (u32, !) does not exist at runtime"

This could also be resolved by making (i32,!) have size 4, which it does currently. There's been discussion about this somewhere but I forgot where...

@RustyYato
Copy link

RustyYato commented Jan 2, 2020

@RalfJung Are you thinking about rust-lang/rust#49298, specifically this comment

@RalfJung
Copy link
Member Author

RalfJung commented Jan 2, 2020

Well that was the start of it, but we had a chat with @rkruppe somewhere about this... but that one has all the links, yeah. (But the comment link doesn't work.)

@RustyYato
Copy link

Ok, based on that I think it is from #37, (fixed the comment link). There is also this PR rust-lang/rust/pull/50622. Finally, there is #165 which is supposed to specify uninhabited types.

@RalfJung
Copy link
Member Author

RalfJung commented Apr 5, 2024

When I dereference a raw pointer but do not actually load any data, such as in &raw const *x, is there any validity requirement on the data it points to?

Meanwhile we have decided that the * projection itself is never UB, so in particular the validity of the data behind the pointer indirection is definitely irrelevant.

@RalfJung RalfJung closed this as completed Apr 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-validity Topic: Related to validity invariants
Projects
None yet
Development

No branches or pull requests

8 participants