Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[used] and symbol visibility is unclear #54135

Closed
alexcrichton opened this issue Sep 11, 2018 · 15 comments · Fixed by #54451
Closed

#[used] and symbol visibility is unclear #54135

alexcrichton opened this issue Sep 11, 2018 · 15 comments · Fixed by #54451
Labels
A-linkage Area: linking into static, shared libraries and binaries T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

alexcrichton commented Sep 11, 2018

I was excited to see #[used] stabilized (yay!) as one of the issues we suffer from in wasm-bindgen is related to symbols being removed. Unfortunately though #[used] doesn't solve our use case!

First I'll try to explain our issue a bit. The #[wasm_bindgen] attribute allows you to import JS functionality into a Rust program. This doesn't work, however, when you import JS functions into a private Rust submodule. (aka mod foo { ... }). When importing a function we also generate an internal exported function which the CLI wasm-bindgen tool uses (and then removes), but it suffices to say that we're generating code that looks like:

mod private {
    #[no_mangle]
    pub extern fn foo() { /* ... */ }
}

Today the symbol foo is not considered alive by rustc itself as it's not reachable. As a result, it's not even translated into the object file. If we instead change this though:

#![feature(used)]

mod private {
    #[no_mangle]
    pub extern fn foo() {}
    
    #[used]
    static F: extern fn() = foo;
}

This still doesn't work! Unfortunately for us the #[used] works as intended but doesn't affect the symbol visibility. The above program generates this IR:

; ModuleID = 'playground.7pbp0xok-cgu.0'
source_filename = "playground.7pbp0xok-cgu.0"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@_ZN10playground7private1F17hb0dc3802d85fadd7E = internal constant <{ i8*, [0 x i8] }> <{ i8* bitcast (void ()* @foo to i8*), [0 x i8] zeroinitializer }>, align 8
@llvm.used = appending global [1 x i8*] [i8* bitcast (<{ i8*, [0 x i8] }>* @_ZN10playground7private1F17hb0dc3802d85fadd7E to i8*)], section "llvm.metadata"

; Function Attrs: norecurse nounwind readnone uwtable
define internal void @foo() unnamed_addr #0 {
start:
  ret void
}

attributes #0 = { norecurse nounwind readnone uwtable "probe-stack"="__rust_probestack" }

the problem here is that the symbol foo, while not mangled, is still marked as internal. This in turns means that it does indeed reach the linker, but for our purposes in wasm-bindgen we need it to survive the linker, not just reach the linker.


Ok so that's the problem statement for wasm-bindgen, but you can generalize it today for rustc by asking: what does #[used] do to symbol visibility? The overall story for symbol visibility in rustc is a little muddied and not always great (especially on ABI-particulars like #[no_mangle] things).

What should the symbol visibility of foo be here?

mod private {
    #[no_mangle]
    pub extern fn foo() {}
    
    #[used]
    static F: extern fn() = foo;
}

We've always had a basic rule of thumb in Rust that "reachable symbols" have non-internal visibility, but it's not clear what to do here. foo is indeed a reachable symbol because of #[used], but it's in a private module. Does that mean because of pub and #[no_mangle] it shouldn't have internal visibility? Should only #[no_mangle] imply that? It's unclear to me!

I'd naively like to send a patch that makes foo not-internal because it has #[no_mangle] and pub (not that it's "publicly reachable"). I think though that this may be deeper in the compiler. I just took a look at how #[used] works, and it's actually a little suprising!

In src/librustc_mir/monomorphize/collector.rs we attempt to not translate anything not reachable in a crate as a form of DCE. I didn't find any handling of #[used], though, and it turns out we unconditionally translate all statics all the time! Then becuase we put it in llvm.used it ends up not getting gc'd by LLVM.

I think that we may want to future-proof this by updating the src/librustc/middle/reachable.rs collection step to basically push #[used] statics onto the worklist to process. The initial worklist is seeded with all items that are public by visibility, and I think we could change it to also be seeded with any #[used] statics. This means that anything referenced by a #[used] static will be pulled in as a result.

Do others think this is a reasonable strategy for having #[used] affect symbol visibility?

cc @michaelwoerister
cc @fitzgen
cc @japaric

@alexcrichton
Copy link
Member Author

I should also note, I'm not even sure that such a tweak here would actually fix our use case in wasm-bindgen. I would want to actually test a compiler first to make sure it works before actually submitting a change like this.

@japaric
Copy link
Member

japaric commented Sep 17, 2018

In my experience, which pretty much only involves ELF objects, whenever I use
#[no_mangle] or #[export_name] I always want global / external
visibility.


Some info about my use case:

I always pair EXTERN with #[export_name] / #[no_mangle]. I use EXTERN to
force the linker to exhaustively search for a symbol in the input object files.
EXTERN also forces the linker to keep the symbol in the output binary.

#[no_mangle]
pub fn main() { .. }
/* exhaustively search for `main` in the input object files */
EXTERN(main);

I sometimes pair #[link_section] with KEEP to place symbols in specific
memory locations, but when I do I always need EXTERN to work around the linker
laziness. Without EXTERN the linker would stop looking through the input
object files once it has resolved all pending undefined references and would
miss the link_section-ed symbols that appear later in the list.

#[link_section = ".vector_table.reset_vector"]
#[no_mangle]
pub static RESET_VECTOR: fn() = reset;

#[link_section = ".vector_table.exceptions"]
#[no_mangle]
pub static EXCEPTIONS: [fn(); 14] = [..];
EXTERN(RESET_VECTOR);
EXTERN(EXCEPTIONS);

SECTIONS
{
  .vector_table :
  {
    KEEP(*(.vector_table.reset_vector));
    KEEP(*(.vector_table.exceptions));
  }
}

EXTERN skips internal symbols (a) which means that I have to be careful to
make my #[no_mangle] / #[export_name] functions and static variables both
public and reachable (i.e. no private module between them and the root of the
crate).

I don't have much problem with this issue but we have embedded crates that
provide attributes that expand into #[export_name]. This means that embedded
end users need to become aware of this "reachability" property, which is not
ideal. There's no way to enforce at compile time that an item marked with a
custom attribute is reachable so the end users can run into linker errors or,
worst, end up with a program that ignores their attribute.


With that background info out of the way,

My proposal would be to instead have #[no_mangle] / #[export_name] imply
global visibility, regardless of whether the symbol is pub or reachable.

  • Would that solve your use case? I think that if #[no_mangle] implies
    global visibility then you won't need the #[used] static because the
    compiler always codegens global symbols (?).

  • Is there an scenario where someone would want both #[no_mangle] /
    #[export_name] and internal visibility? I can't think of any such case
    (b).

There's also the third option of adding a new attribute to control visibility:
#[visibility(internal)], #[visibility(external)], etc.

Do others think this is a reasonable strategy for having #[used] affect symbol
visibility?

Some people (myself included) would find that surprising as #[used] was
designed to have the same semantics as __attribute__((used)) and that one
doesn't affect symbol visibility.


(a) LD's -u flag has the same semantics as EXTERN. Here you can see that it
doesn't work on internal symbols.

// src/main.rs
fn main() {}

// internal visibility
#[no_mangle]
#[used]
static FOO: u32 = 1;
$ cargo rustc --release -- -C link-arg=-Wl,-uFOO
$ nm target/release/hello | grep FOO
                 U FOO
// src/main.rs
fn main() {}

// external visibility
#[no_mangle]
pub static FOO: u32 = 1;
$ cargo rustc --release -- -C link-arg=-Wl,-uFOO
$ nm target/release/hello | grep FOO
000000000004e000 R FOO

(b) Internal symbols do not work for intra-Rust FFI which is the only case I
can think of.

// bar/src/lib.rs
#[no_mangle]
fn foo() {}

#[used]
static KEEP: fn() = foo;
// src/main.rs
extern crate bar;

fn main() {
    extern "Rust" {
        fn foo();
    }

    unsafe {
        foo();
    }
}

This fails to link:

$ cargo run
(..)
          /home/japaric/tmp/hello/src/main.rs:9: undefined reference to `foo'

If you tweak bar like this

// bar/src/lib.rs
#[no_mangle]
pub fn foo() {}

Then the program links.

@parched
Copy link
Contributor

parched commented Sep 17, 2018

Is there an scenario where someone would want both #[no_mangle] /
#[export_name] and internal visibility?

One reason would be when using global_asm!. However, I think the better way to solve that use case is a way to pass rust mangled names to global_asm!.

@japaric
Copy link
Member

japaric commented Sep 17, 2018

One reason would be when using global_asm!

Due to parallel codegen I don't think it's guaranteed that the #[no_mangle]-ed symbol and the global_asm! symbol will end in the same object file even if they are defined in the same module / crate. If that's not the case the linker could pick up the global_asm! symbol from object file but miss the #[no_mangled] symbol leading to an "undefined reference" error.

I have had plenty of trouble with multiple codegen units and I have found that the only thing that reliably works is EXTERN and external visibility.

@alexcrichton
Copy link
Member Author

Thank for writing that up @japaric! I think I agree with you that #[used] shouldn't be used to affect symbol visibility, drawing parallels with C is a compelling use case.

I hadn't really considered it before but forcing all #[no_mangle] items to an extern linkage visibility doesn't seem like it's such a bad idea! The only other use case I can think of to add to @parched's use case is the old rust_begin_unwind symbol (or maybe rust_panic nowadays, I forget) which was intended for adding breakpoints to panics, but IIRC this doesn't really work reliably anyway.

We do have an unstable #[linkage] attribute which can explicitly control linkage (although it would likely require updates to the MIR item collector to ensure it always visits items with #[linkage]), and I think that we may just want to go that route. I like the idea of always making #[no_mangle] items have external linkage unless there's some attribute saying it should be internal (like #[visibility(private)] or #[linkage = "internal"].

@japaric
Copy link
Member

japaric commented Sep 17, 2018

I like the idea of always making #[no_mangle] items have external linkage unless there's some attribute saying it should be internal (like #[visibility(private)] or #[linkage = "internal"].

👍. Just want to add: let's apply the same logic to #[export_name]; #[no_mangle] is just sugar for #[export_name = "<item-name>"], after all.

@alexcrichton
Copy link
Member Author

Ah sorry yeah, #[export_name] I consider to be the same as #[no_mangle] as well.

@retep998
Copy link
Member

Coming from the perspective of the Windows linker model, I am absolutely in agreement that #[no_mangle] and #[export_name] should always imply external linkage.

@pnkfelix pnkfelix added A-linkage Area: linking into static, shared libraries and binaries T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 20, 2018
@japaric
Copy link
Member

japaric commented Sep 21, 2018

I don't know if this kind of change needs a RFC or FCP but I have put up a PR that implements this in #54414.

@alexcrichton
Copy link
Member Author

Thanks @japaric! I extended that a bit to get #54451 as well!

@pnkfelix
Copy link
Member

@alexcrichton you are right that setting breakpoints on rust_panic does not work reliably; see in particular #49013

@nagisa
Copy link
Member

nagisa commented Sep 22, 2018

From the PR:

This new definition is that #[no_mangle] is always reachable, no matter where it is in a crate or whether it has pub or not.

How would one make sure, that privacy of #[no_mangle]d items would still be maintained properly? I definitely do not want people to be able to circumvent privacy by doing an extern { fn that_unmangled_function(…)… }.


Also, such change would make all no_mangle items less optimisable.

@alexcrichton
Copy link
Member Author

@nagisa the PR's changes propose doing that, yes, circumventing privacy. That seems to me, though, simply a fact of how #[no_mangle] works, it's explicitly used to circumvent privacy and that's why it's not on-by-default (like in C)

alexcrichton added a commit to alexcrichton/rust that referenced this issue Oct 6, 2018
This commit updates the compiler to allow the `#[no_mangle]` (and
`#[export_name]` attributes) to be located anywhere within a crate.
These attributes are unconditionally processed, causing the compiler to
always generate an exported symbol with the appropriate name.

After some discussion on rust-lang#54135 it was found that not a great reason
this hasn't been allowed already, and it seems to match the behavior
that many expect! Previously the compiler would only export a
`#[no_mangle]` symbol if it were *publicly reachable*, meaning that it
itself is `pub` and it's otherwise publicly reachable from the root of
the crate. This new definition is that `#[no_mangle]` *is always
reachable*, no matter where it is in a crate or whether it has `pub` or
not.

This should make it much easier to declare an exported symbol with a
known and unique name, even when it's an internal implementation detail
of the crate itself. Note that these symbols will persist beyond LTO as
well, always making their way to the linker.

Along the way this commit removes the `private_no_mangle_functions` lint
(also for statics) as there's no longer any need to lint these
situations. Furthermore a good number of tests were updated now that
symbol visibility has been changed.

Closes rust-lang#54135
bors added a commit that referenced this issue Oct 7, 2018
…elwoerister

rustc: Allow `#[no_mangle]` anywhere in a crate

This commit updates the compiler to allow the `#[no_mangle]` (and
`#[export_name]` attributes) to be located anywhere within a crate.
These attributes are unconditionally processed, causing the compiler to
always generate an exported symbol with the appropriate name.

After some discussion on #54135 it was found that not a great reason
this hasn't been allowed already, and it seems to match the behavior
that many expect! Previously the compiler would only export a
`#[no_mangle]` symbol if it were *publicly reachable*, meaning that it
itself is `pub` and it's otherwise publicly reachable from the root of
the crate. This new definition is that `#[no_mangle]` *is always
reachable*, no matter where it is in a crate or whether it has `pub` or
not.

This should make it much easier to declare an exported symbol with a
known and unique name, even when it's an internal implementation detail
of the crate itself. Note that these symbols will persist beyond LTO as
well, always making their way to the linker.

Along the way this commit removes the `private_no_mangle_functions` lint
(also for statics) as there's no longer any need to lint these
situations. Furthermore a good number of tests were updated now that
symbol visibility has been changed.

Closes #54135
ischeinkman pushed a commit to ischeinkman/libnx-rs-std that referenced this issue Dec 20, 2018
This commit updates the compiler to allow the `#[no_mangle]` (and
`#[export_name]` attributes) to be located anywhere within a crate.
These attributes are unconditionally processed, causing the compiler to
always generate an exported symbol with the appropriate name.

After some discussion on rust-lang#54135 it was found that not a great reason
this hasn't been allowed already, and it seems to match the behavior
that many expect! Previously the compiler would only export a
`#[no_mangle]` symbol if it were *publicly reachable*, meaning that it
itself is `pub` and it's otherwise publicly reachable from the root of
the crate. This new definition is that `#[no_mangle]` *is always
reachable*, no matter where it is in a crate or whether it has `pub` or
not.

This should make it much easier to declare an exported symbol with a
known and unique name, even when it's an internal implementation detail
of the crate itself. Note that these symbols will persist beyond LTO as
well, always making their way to the linker.

Along the way this commit removes the `private_no_mangle_functions` lint
(also for statics) as there's no longer any need to lint these
situations. Furthermore a good number of tests were updated now that
symbol visibility has been changed.

Closes rust-lang#54135
@RReverser
Copy link
Contributor

I came across this issue when searching for ways to do the opposite.

I'm linking with C code via cc-rs, and I want to expose some functions to it from the Rust part, so I'm using #[no_mangle] to have a predictable unmangled name that C code to refer to.

The problem is, now the functions that are referred to by #[no_mangle] survive all the way to the final linkage and become publicly exposed from the resulting cdylib, even though I explicitly put them in a private module.

Is there any way to circumvent that and make sure that symbols remain visible only internally between translation units, but not exposed from the final artifact?

@hsivonen
Copy link
Member

hsivonen commented Jul 2, 2020

I came across this issue when searching for ways to do the opposite.

I filed a new issue for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-linkage Area: linking into static, shared libraries and binaries T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants