Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple incompatible function declarations #198

Open
gnzlbg opened this issue Aug 27, 2019 · 27 comments
Open

Multiple incompatible function declarations #198

gnzlbg opened this issue Aug 27, 2019 · 27 comments
Labels
A-ffi Topic: Related to FFI C-open-question Category: An open question that we should revisit S-pending-design Status: Resolving this issue requires addressing some open design questions Triaged Visited during a backlog bonanza meeting

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 27, 2019

Does the following example Rust code has undefined behavior ?

#[no_mangle] extern "C" fn foo(x: *mut i32) { ... }
mod maybe_bad0 {
     extern "C" { fn foo(x: Option<&mut i32>); } // OK or UB ?
}
mod maybe_bad1 {
     extern "C" { fn foo(x: NonNull<i32>); } // OK or UB ?
}
mod maybe_bad2 {
     extern "C" { fn foo(x: *mut i32); } // OK or UB ?
}

Note: the #[no_mangle] definition is used for exposition, and we can replace it
with the following C function definition in a TU that's linked with that Rust program:

void foo(int32_t* x) { ... }

There is a lot of Rust code using Rust types to "enrich" C APIs, the unsafe keyword does not appear anywhere in the example, and the equivalent C code would have undefined behavior: it is instant undefined behavior in C to provide a function declaration whose types do not "properly" match the function definition - LLVM uses this information to optimize the LLVM-IR that we currently produces for this example, and ends up adding noalias, nonnull, dereferenceable, etc. to the maybe_bad2::foodeclaration whose type actually matches the definition. This is an instance of rust-lang/rust#46188 . It is unclear whether it would be a legal optimization on LLVM-IR to propagate such attributes to the function definition, which would result in severe mis-compilations. This interacts with LTO and therefore probably with xLTO as well (e.g. a Rust declaration can probably propagate attributes to a C declaration when xLTO is involved).


In C, declarations are not only unsafe to call, but also unsafe to declare. I don't think we can do that in Rust, since that would break pretty much all existing FFI code, and it would not allow users to use Rust types to better express the API of C function declarations.

So AFAICT, we have to rule the examples above as correct, and implement them in such a way that does not cause miscompilations - that is, we could close this and just handle this by fixing: rust-lang/rust#46188 , and maybe adding a PR to the reference explaining that this is explicitly ok.

@RalfJung RalfJung added C-open-question Category: An open question that we should revisit A-ffi Topic: Related to FFI labels Aug 27, 2019
@RalfJung
Copy link
Member

RalfJung commented Aug 27, 2019

Interaction with C quickly gets messy to consider (and we can probably put this under "FFI is unsafe"), but what seems worrisome to me is the safe pure Rust version of this:

#[no_mangle]
/* extern "Rust" */ // this is implicit
unsafe fn foo(x: *mut i32, y: *mut i32) -> i32 {
  *x = 42;
  *y = 13;
  return *x;
}
fn main() {
  let mut x = 0;
  let raw = &mut x as *mut _;
  let r = unsafe { foo(raw, raw); }
  println!("{}", r);
}

#[cfg(bad)]
mod bad_but_unused {
     extern "Rust" { fn foo(x: &mut i32, y: &mut i32); }
}

Without bad, I hope we agree that this code is actually safe and fine. But if I understand you correctly, with --cfg bad, this will actually make LLVM add noalias to foo, so the program might be miscompiled?

That seems like a problem.

@RalfJung
Copy link
Member

RalfJung commented Aug 27, 2019

Maybe we should just not add noalias attributes on declarations (nor any other attributes)? There seems to be little value in doing so. Either the definition already has these attributes and this is redundant, or it does not and this is dangerous.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 27, 2019

Without bad, I hope we agree that this code is actually safe and fine.

Yes, agreed.

But if I understand you correctly, with --cfg bad, this will actually make LLVM add noalias to foo, so the program might be miscompiled?

No, I wasn't able to come up with an example where that happened.

What LLVM currently does is merge the attributes of declarations "somehow". So here:

mod bad0 { extern "C" { fn foo(x: NonNull<i32>); } }
mod bad1 { extern "C" { fn foo(x: *mut i32); } }

it will make the argument of bad1::foo nonnull.

The moment you add a definition, a different kind of merging happens, but I wasn't able to get LLVM to propagate attributes from declarations to definitions, only from definitions to declarations. So here:

extern "C" fn foo(x: NonNull<i32>) {}
mod bad1 { extern "C" { fn foo(x: *mut i32); } }

bad1::foo's argument becomes nonnull, but I wasn't able to achieve the opposite here:

extern "C" fn foo(x: *mut i32) {}
mod bad1 { extern "C" { fn foo(x: NonNull<i32>); } }

That is, there, crate::foo's argument does not become nonnull, and nonnull is removed from bad1::foo's argument to match crate::foo.

Maybe we should just not add noalias attributes on declarations (nor any other attributes)? There seems to be little value in doing so. Either the definition already has these attributes and this is redundant, or it does not and this is dangerous.

The question that we should resolve is what transformations are valid for LLVM-IR ?

What you propose might work as long as it is invalid for LLVM to propagate attributes from a declaration to a definition. It wouldn't be great if adding a declaration without nonnull ends up removing nonnull from the definition.

Alternatively, maybe there is a pass that merges function declarations, and we can just tune that? (e.g. not run the pass).

@RalfJung
Copy link
Member

I see. So what we really need is a clear statement from the LLVM devs about propagation of attributes between declaration(s) and definition of a function.

What you propose might work as long as it is invalid for LLVM to propagate attributes from a declaration to a definition. It wouldn't be great if adding a declaration without nonnull ends up removing nonnull from the definition.

It wouldn't be great but it also wouldn't be UB.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 27, 2019

Obvious in hindsight, but this also applies to function attributes like nounwind, e.g., https://rust.godbolt.org/z/_hrdg9:

#![feature(unwind_attributes)]
pub mod bad0 {
    extern "Rust" { /* #[unwind(allow)] */ fn foo(); }
    pub unsafe fn call() -> usize { foo as usize }
}

pub mod bad1 {
    extern "Rust" { #[unwind(allow)]  fn foo(); }
    pub unsafe fn call() { foo() }
}

where because bad0::foo is nounwind, bad1::foo becomes nounwind as well even though it is explicitly #[unwind(allow)]. The fix for this would be to never emit nounwind for extern declarations..

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 7, 2019

The fix for this would be to never emit nounwind for extern declarations..

An alternative here might be to have slightly more complex rules. For example, providing "incorrect" declarations is ok, as long as no declaration to the same symbol is called in the whole program. That is, safe Rust code that only provides declarations would be UB free.

The moment one declaration is called, the unsafe { } needs to assert that no "incorrect" declarations exist.

We would then need to define what "incorrect" is, potentially on an attribute-by-attribute basis. For example, if a function can unwind, adding a declaration that's nounwind would be incorrect. This would allow us to keep on adding nounwind everywhere. For other attributes we might need different rules. OTOH, if a function takes a raw pointer, it would be ok to use a &mut T instead of a *mut T on some declarations but not others, as long as we never emit noalias for any of them, etc.

This would potentially result in action as a distance. For example, a crate like this:

// crate A
extern "C" { fn foo(); }

contains no unsafe code and has no UB. This other crate contains no UB either:

// crate B
extern "C" { #[unwind(allows)] fn foo(); } 
pub unsafe fn bar() { foo() }

fn main() { unsafe { bar() } }

However, if foo can actually unwind, adding crate A as a dependency of B would be UB.

Maybe we could track all declarations in the dependency graph, metadata of rlibs, static libs, etc. and make sure that compilation fails if two mismatching declarations exist anywhere ? This doesn't solve the problem if xLTO with C though.

@RalfJung
Copy link
Member

RalfJung commented Sep 16, 2019

The moment one declaration is called, the unsafe { } needs to assert that no "incorrect" declarations exist.

Yes, that's kind of how I think about it: when you call the function, you are actually asserting that all the declarations that exist are valid, not just the one this gets resolved to.

Not pretty. We can maybe help with a lint as long as everything is Rust code, but it'd be a cross-crate lint..

We would then need to define what "incorrect" is, potentially on an attribute-by-attribute basis. For example, if a function can unwind, adding a declaration that's nounwind would be incorrect. This would allow us to keep on adding nounwind everywhere. For other attributes we might need different rules.

Are these attributes sufficiently well-behaved that we can take their "conjunction" and then say that that defines the actual effective attributes that the call must abide to?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 16, 2019

Are these attributes sufficiently well-behaved that we can take their "conjunction" and then say that that defines the actual effective attributes that the call must abide to?

LLVM doesn't say, and I don't think we can say anything for them. We probably would need to open an LLVM bug report, and ask them to document the behavior or explicitly say that it is not allowed.

Whether they provide some rule for all attributes, or whether they do this on an attribute-by-attribute basis, is kind of up to them.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 16, 2019

when you call the function, you are actually asserting that all the declarations that exist are valid, not just the one this gets resolved to.

This would be really hard to check. For example:

// Rust:
extern "C" {
    fn foo();  // nounwind
    #[unwinds] fn bar();
}

pub fn api() { unsafe { bar() } } 
// c++ (in C++ `extern "C" == #[no_mangle]` only)
extern "C" void foo() { throw; } // can unwind
extern "C" void bar() { foo() } // can unwind

Here the Rust api calls bar, and the bar declaration is correct - foo is not called in Rust. That calls into C++, where bar calls foo - all looks correct here as well. But the Rust code contains a nounwind declaration of foo, which could make the C++ foo definition nounwind, making the C++ code exhibit undefined behavior, and that would cause the Rust api function to exhibit UB as well.

@RalfJung
Copy link
Member

This would be really hard to check.

True. But given the behavior at hand, I don't see a better way.

Maybe we can get an (optional) pass into LLVM that makes it a hard error when attributes conflict. Or something that uses the least restrictive set of attributes when multiple declarations coexist. But with the current behavior, I don't see a better spec that we could write.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Nov 10, 2019

This safe Rust program came up on Zulip:

extern "C" { static FOO: &'static u8; }
fn main() { }

the value of FOO is written at some point during initialization, and the program probably has UB if FOO happens to be null due to a violation of the validity invariant for references.

I think this is at least tangentially related to this issue, because for FOO to be null, that means that there is a FOO definition somewhere, which is incompatible with this declaration, and that might already be UB (e.g. if this program is linked with a Rust library that defines: FOO: *const u8).

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jan 12, 2020

Re-reading my last comment, I think the most appealing solution for me would be to make extern blocks unsafe in a future edition. I really have no idea how that example could work in safe code.

@thomcc
Copy link
Member

thomcc commented Nov 11, 2021

If this is UB, it's a bit of a problem for rust/objc FFI, since even the stdlib uses this to call objc_msgSend: https://github.com/rust-lang/rust/blob/8e0293137f895a417fa043b9817c455150769406/library/std/src/sys/unix/args.rs#L204-L219

The alternative is explicitly transmuting the function, which I had always assumed was equivalent.

@briansmith
Copy link

LLVM to my knowledge still assumes the one-definition-rule.

I don't think the One Definition Rule applies here. The One Definition Rule is a C++ rule that forbids multiple definitions, not declarations. I don't think LLVM is applying C++ rules to non-C++ programs that are incompatible with C rules.

ISO C says about declarations "All declarations that refer to the same object or function shall have compatible type; otherwise, he behavior is undefined." The ISO C spec emphasizes that "compatible" doesn't mean "identical" and gives rules for when certain different type declarations must be considered identical. I think the discussion above, and common sense, indicates that LLVM tries to implement rules that are at least close to ISO C's compatibility rules.

For example, "For two array types to be compatible, both shall have compatible element types, and if both size specifiers are present, and are integer constant expressions, then both size specifiers shall have the same constant value." This means declarations int x[] and int x[5] are compatible, but int x[3] and int x[5] are not compatible. This allows C programmers to do the same kind of "enrichment" that Rust programmers are trying to do, as described in the initial comment above. I recommend that Rust allow at least as much type enrichment in declarations as C allows. However, that's a very low bar. ISO C only sets the minimum bar; implementations are free to make things more compatible if they wish.

Note that MSVC has a quite powerful declaration enrichment called SAL; see https://learn.microsoft.com/en-us/cpp/code-quality/annotating-function-parameters-and-return-values?view=msvc-170 to see the extensive set of enrichment features it provides for function declarations. We should be striving to have the same--better--in Rust. And to the extent that multiple incompatible declarations would cause UB, it's the Rust compiler's job to reject at compile-time programs that have incompatible declarations.

In other words, we shouldn't settle for whatever we think LLVM might currently implement. If LLVM needs to be improved in this area then we should do that, to get safe semantics.

@RalfJung
Copy link
Member

I doubt LLVM will consider int* and int* restrict compatible, so for us this means we cannot consider &mut T and *mut T compatible.

In other words, we shouldn't settle for whatever we think LLVM might currently implement. If LLVM needs to be improved in this area then we should do that, to get safe semantics.

Fully agreed. Someone just needs to figure out what semantics we want, and do the work of implementing that. :)

@briansmith
Copy link

briansmith commented Oct 21, 2022

ISO C says restrict is a "type qualifier" (along with const, volatile, and _Atomic) and says "For two qualified types to be compatible, both shall have the identically qualified version of a compatible type; the order of type qualifiers within a list of specifiers or qualifiers does not affect the specified type."

However, there's in general no way to write a Rust function declaration that has the same qualifiers as a C function. For example:

"
Here's the ISO C declaration of memcpy:

void *memcpy(void * restrict s1, const void * restrict s2, size_t n);

and the musl libc declaration:

void *memcpy (void *__restrict, const void *__restrict, size_t);

Here's the Rust libc declaration:

pub unsafe extern "C" fn memcpy(
    dest: *mut c_void, 
    src: *const c_void, 
    n: size_t
) -> *mut c_void

So we are relying on LLVM (and really all toolchains) considering non-restrict-qualified declarations to be compatible with restrict-qualified declarations and definitions, right?

@briansmith
Copy link

ISO C also has the concept of "composite type," which is basically the union of all the restrictions imposed by all compatible declarations. It gives an example:

Given the following two file scope declarations:

int f(int (*)(), double (*)[3]);
int f(int (*)(char *), double (*)[]);

The resulting composite type for the function is:

int f(int (*)(char *), double (*)[3]);

This concept of composite type seems useful here.

@briansmith
Copy link

mod maybe_bad1 {
     extern "C" { fn foo(x: NonNull<i32>); } // OK or UB ?
}
mod maybe_bad2 {
     extern "C" { fn foo(x: *mut i32); } // OK or UB ?
}

Here is one way to express this in C:

void foo(int32_t x[static 1]); // `x` guaranteed non-null and point one value.
void foo(int32_t *x); // `x` not declared to be guaranteed non-null

ISO C also says "A declaration of a parameter as 'array of type' shall be adjusted to 'qualified pointer to type', and static is not a qualifier, so these two declarations are compatible in (ISO) C, AFAICT. The composite type of the parameter is int32_t x[static 1], i.e. a non-null pointer to (at least) one element.

Note that totally different language features (Rust: NonNull, C: [static] and pointer/array type punning) are used to achieve the same effect. This and the restrict example show that in general there is no way to have "the same" declarations across Rust and C. Since we cannot map Rust and C types 1:1 to each other, it seems like we have to modify (extend) the C compatibility and composition rules with Rust types in order for extern "C" to really work. The compiler should be able to verify compatibility of at least the declarations made in Rust, failing with an error when the rules are violated.

For extern "Rust", maybe the better solution is to require much more strict compatibility rules for types.

@RalfJung
Copy link
Member

RalfJung commented Oct 22, 2022

So we are relying on LLVM (and really all toolchains) considering non-restrict-qualified declarations to be compatible with restrict-qualified declarations and definitions, right?

We are relying on the fact that as long as every caller satisfies the preconditions implied by every declaration, LLVM considers everything to be fine.

The situation that must not happen is that memcpy is declared without noalias and then called, through that declaration, with aliasing pointers despite the fact that elsewhere there exists a declaration with noalias. (For memcpy that would already be library UB anyway.)

This does sound a lot like that notion of "composite type" indeed!

@briansmith
Copy link

We are relying on the fact that as long as every caller satisfies the preconditions implied by every declaration, LLVM considers everything to be fine.

rust-lang/rust#46188 seems to be saying that just the presence of the declaration, which is never used, affected the compilation.

@RalfJung
Copy link
Member

Yes, that is implied by what I said. Every declaration must be satisfied, whether it is used or not.

@briansmith
Copy link

Yes. Note though that every caller can satisfy the preconditions of all the declarations, but things can still go wrong, because LLVM might assume stronger postconditions than what the function actually provides; e.g. the non-null constraint on the result of malloc.

@RalfJung
Copy link
Member

Oh sure, but that is a separate issue -- all these declarations refer to the same implementation after all, so as long as each declaration individually can ensure that the linked implementation satisfies the given postconditions, we are fine on the postcondition side.

Though I guess this could become interesting when a postcondition holds for all the ways that one declaration is called, but not for others... silly example: consider a function fn id(x: i32) -> i32 that just returns its argument. Another crate might import this as fn id(x: NonZeroI32) -> NonZeroI32, and as long as that crate only calls the function with non-0 arguments, the postcondition holds. But now if a third crate imports this as fn id(x: i32) -> i32, and calls id(0), then we have a conflict again.

@saethlin saethlin added S-pending-design Status: Resolving this issue requires addressing some open design questions Triaged Visited during a backlog bonanza meeting labels Aug 29, 2023
@RalfJung
Copy link
Member

RalfJung commented Dec 15, 2023

This entire thread was based on the assumption that there's UB in C here and LLVM is in its right to inherit that UB. (Though AFAIK we never tried to convince them to not do that.)

However, turns out that this actually is not UB in C. So we should possibly go back to square one here. Can we convince LLVM to not do this kind of attribute merging? Can we make it use the weakest attributes from all declarations, rather than the strongest? That would nicely resolve this entire mess without having to burden our users with new hard-to-check safety requirements.

Cc @nikic for some LLVM expertise. The issue is about what happens when there are multiple function declarations with different attributes in their signatures. It should be enough to read the first 2 messages in this issue and this one.

@nikic
Copy link

nikic commented Dec 15, 2023

@RalfJung Can you share an example of C code that gets miscompiled by Clang? It's been a while since I last looked at this topic, but it was my understanding that this is more about rustc behavior than LLVM behavior.

I believe that Clang does not convert restrict in declarations into noalias, only in definitions, so there should be no issue with compatibility of non-qualified types.

Within a single module, there is no such thing as "multiple declarations" from an LLVM perspective. How multiple source language declarations/definitions are mapped onto a single LLVM declaration/definition is entirely up to the frontend. There may be some LTO considerations here, but as far as I know all the examples discussed are within a single module.

I believe the solution we have discussed in the past (in one of these threads...) is to place the attributes only on calls and definitions, rather than declarations.

@RalfJung
Copy link
Member

I don't think we have a miscompilation example, maybe @gnzlbg has one.

To reconstruct this in C or C++ we need some functions with interesting attributes. In C I only know of restrict. One could declare a function with restrict and define it without restrict and LLVM might compile the function with noalias and that would arguably be wrong. But restrict does not have any caller-side effects, only callee-side effects, and within a single TU I am not sure if LLVM will ever overwrite the attributes of the definition with those form the declaration? It would be good if someone could figure out more precisely what LLVM will and won't do in terms of attribute merging.

In C you can't even have multiple declarations of the same function and then "call one of them" for lack of namespaces.

With C++, I think C++ references may generate nonnull dereferenceable? So then if we have the same function declared as both int square(int &num) and int square(int *num), and one calls the int * declaration, LLVM might apply a nonnull and use that for caller-side optimizations. But in C++ I assume mixing references and pointers is already UB.

Are there clang C++ extensions for attributes on pointers, so that we could declare a function with and without nonnull attribute in two different namespaces and try to reproduce the bug this way?

@RalfJung
Copy link
Member

Based on further discussion in rust-lang/rust#46188, it turns out this is actually more of a rustc issue than an LLVM issue. Also see the discussion here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ffi Topic: Related to FFI C-open-question Category: An open question that we should revisit S-pending-design Status: Resolving this issue requires addressing some open design questions Triaged Visited during a backlog bonanza meeting
Projects
None yet
Development

No branches or pull requests

6 participants