-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[strict provenance] Fix System APIs That Are Liars #95496
Comments
Note: there's probably way more of these but I only audited x64 Windows and x64 gnu-linux. Once we have lints for this stuff we should encourage folks who Do That Platform to run them and find what's up with those platforms (and ideally fix them). |
Does provenance matter across the user/kernel boundary? (Although most calls will pass via libc before the real syscall.) These all seem to be ptr2int, and not int2ptr, which seems less bad, right? Then C can deal with provenance itself. If you then involve cross-lang LTO or something, I don't see how it makes a difference if we lie about the int/ptr API on our end. |
provenance matters up until that point. basically, we just need to make sure that the compiler understands that a "real pointer" is coming from FFI or a "real pointer" is leaking into FFI. That means ptr2int is also concerning because "under" strict-provenance the compiler wouldn't treat this as "exposing" the pointer and might think it's unaliased, resulting in miscompilation if the kernel then reads/writes through the pointer. (nowhere near actually doing such an optimization, but we want as much code as possible to conform to those kinds of semantics because then it's bulletproof.) |
My understanding at the current moment is that if we have our own house in order, according to the Rust model, then all Rust code could be compiled and have LTO done according to the rules of Rust, without looking at the C code, and be correct. Thus all cross-lang link-time optimizations could happen after and as long as they are done according to the understood rules the cross-lang optimizer must abide by, they would be correct. If we do not, then a cross-lang optimizer could not defer the work and thus the ordering of optimizations would matter for not just performance but soundness, and likely have to be pervasive. |
Another Provenance Crime API is the one for setting a thread name: rust-lang/miri#1717.
Basically, if for a given provenance P, we never give away a pointer with provenance P to anything else, then under strict provenance the compiler may assume that nothing reads or write with that provenance. Giving away merely the address does not "expose" the memory to third-party accesses, because an address is insufficient to read or write memory -- you need the right provenance, too. So, in the case of setting the thread name, the compiler may assume that nothing ever actually reads the name of the thread. Obviously that doesn't work. That's why having proper provenance all the way until the edge of the FFI boundary is important. (What matters on the other side is irrelevant, of course.) Another way to think about this is: the compiler does not know what the other side of the FFI boundary does, but whatever it does must be rationalizable inside the Rust Abstract Machine. No Rust function could possibly access private Rust memory of which it just learned the address (but not the provenance), and hence the compiler may assume that when calling a function and giving it just the address of that memory, this function cannot access that memory. |
It absolutely matters for CHERI-enabled architectures, any I know that many Linux APIs like to use long/u64 rather than the correct |
@RalfJung I take that this also applies to DMA configuration through MMIO (bare-metal rust), right ? Currently, we generate the MMIO interface through (mostly) vendor provided SVD files, where we usually get just the width of the registers, so we always use unsigned integers. It seems that we would need to manually patch the DMA address registers to have a pointer type. Should we use Here is an example: https://docs.rs/stm32f4/0.14.0/stm32f4/stm32f401/dma2/struct.ST.html#structfield.m0ar |
Uh, DMA/MMIO touch on many aspects that are quite hard to catch formally.^^ But, generally speaking: if the compiler just sees If you want to not use |
Yes. When you convert a reference to a raw ptr, the mutability of the ptr matters: |
Note that for DMA specifically, the llvm docs for volatile basically disallow you to use DMA with a volatile access if any non-volatile memory is involved in the DMA (in other words: if any rust memory is read/written). So for DMA you want to use inline asm. |
@RalfJung We define the MMIO "registers" as structs, and then we just cast a fixed usize to a ptr to the struct. Yes, this violates strict provenance, but it's already listed on things to consider and I don't think it will cause many problems, since the address is outside of the space managed by the compiler and we always use volatile operations. So we just need to mark the type in the struct as @Lokathor What do you mean by DMA with a volatile access ? We don't usually do volatile operations on the DMA buffer, just write to the register and add fences for synchronization, isn't it enough ? What's the difference between that and e.g. passing pointers to the kernel to do io_uring ? |
The write to the device register would have to be volatile. Otherwise I believe the compiler is allowed to move the write, load it again after the write or write twice. Volatile exists precisely for device register accesses (and signal handlers) |
So the pointer has type |
Well i'm just gonna quote part of the docs:
I read "A volatile operation may not modify any other memory accessible by the module being compiled." as being that volatile can't modify any normal memory your program can see, including between copying between buffers in your normal program. Maybe I'm being too paranoid. |
The DMA buffers wouldn't need volatile, just a synchronization fence or something similar. The device register write initiating the DMA transfer (and any other device register access) would need volatile. |
The volatile operation will not modify any other memory. It can of course modify the memory behind All this sentence does is let the compiler reorder volatile and non-volatile operations. |
Yes, the write to the register must be volatile, what I meant is that we will use non-volatile operations on the destination buffer for the DMA (after DMA is done).
I think that this is just limiting the side-effects of volatile operations, as in, a volatile write to memory address The problem is that what constitutes a pointer escape seems to be fuzzy. What I have seen is that it's usually done by writing/exposing the pointer to a place "outside of the compiled module", plus proper synchronization, e.g. fences. |
This function is available on Linux since glibc 2.12, musl 1.1.16, and uClibc 1.0.20. The main advantage over `prctl` is that it properly represents the pointer argument, rather than a multi-purpose `long`, so we're better representing strict provenance (rust-lang#95496).
…ulacrum linux: Use `pthread_setname_np` instead of `prctl` This function is available on Linux since glibc 2.12, musl 1.1.16, and uClibc 1.0.20. The main advantage over `prctl` is that it properly represents the pointer argument, rather than a multi-purpose `long`, so we're better representing strict provenance (rust-lang#95496).
@Gankra , I have good example of liar system API. It is https://docs.rs/libc/latest/libc/fn.readdir.html with https://docs.rs/libc/latest/libc/struct.dirent.html . Struct |
From https://man7.org/linux/man-pages/man3/readdir.3.html
(as an additional note: not all platforms enforce NAME_MAX, this field should not be treated as having any fixed size on any platform, even using NAME_MAX) |
Fixed vita libc definitions This fixes definitions that were incorrect in my initial PR (#3209). As with the previous one, this relies on open-source newlib implementation and doesn't use Sony internal API's or expose Sony internal data structures. This fixes: - fs API that uses `dirent` and stat - sockets - pthreads A couple of explanations - Unfortunately `dirent` in vita newlib is not very POSIX'y, before `d_name` it has a field with an internal data structure, which is of no use for std (no inodes, no file type inside), so I've just stubbed it as an `__offset`. Also, Vita doesn't expose inodes - `dirent` doesn't have it at all, and in `stat` it is always `0`. I am not sure what would be the best approach here. Maybe I should move the POSIX `dirent` to `generic.rs` and reexpose it in all targets, and redefine it in `vita/mod.rs`? - All pthread types on Vita are pointers, and the backing data structure is allocated by corresponding init functions, so their sizeof's are all 4. I also changed `pthread_attr_t` and `pthread_rwlockattr_t` to reflect their sizes and not be constant. May be in relation to rust-lang/rust#95496 it would be better to move existing thread definitions to generic, and for vita specifically make them pointer types instead of byte arrays. The fixes in std will be in rust-lang/rust#111819
This issue is part of the Strict Provenance Experiment - #95228
Some system APIs (and C FFI in general) like to lie about whether things are pointers, and this makes strict-provenance very sad.
prctl for PR_SET_NAME (says a pointer is unsigned long):
rust/library/std/src/sys/unix/thread.rs
Lines 120 to 127 in 44628f7
clone3(?) (says a pointer is u64):
rust/library/std/src/sys/unix/process/process_unix.rs
Lines 185 to 198 in 44628f7
sigaction (defines sighandler_t (a callback) to be size_t):
rust/library/std/src/sys/unix/stack_overflow.rs
Line 130 in 44628f7
The "level 1" fix for this is to just change our extern decls so that all of these APIs/types actually say "this is a pointer". In general it's ok for integers to pretend to be pointers "for fun", and if anything is ever
int | ptr
the valid union of these types isptr
.The "level 2" fix for this is to instead come up with some general mechanism for dealing with these kinds that folks can use in the larger ecosystem, like a "this is lies" annotation or uptr. This kind of thing is especially nasty for people who are "bindgen true believers" and don't want to ever hand-edit generated bindings (so, not us).
The text was updated successfully, but these errors were encountered: