Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable customizing the linkage of a platform's C runtime #1721

Merged
merged 7 commits into from
Oct 25, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
371 changes: 371 additions & 0 deletions text/0000-crt-static.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,371 @@
- Feature Name: `crt_link`
- Start Date: 2016-08-18
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Enable the compiler to select whether a target dynamically or statically links
to a platform's standard C runtime through the introduction of three orthogonal
and otherwise general purpose features, one of which will likely never become
stable and can be considered an implementation detail of std. These features do
not require the compiler or language to have intrinsic knowledge of the
existence of C runtimes.

The end result is that rustc will be able to reuse its existing standard library
binaries for the MSVC and musl targets to build code that links either
statically or dynamically to libc.

The design herein additionally paves the way for improved support for
dllimport/dllexport, and cpu-specific features, particularly when
combined with a [std-aware cargo].

[std-aware cargo]: https://github.com/rust-lang/rfcs/pull/1133

# Motivation
[motivation]: #motivation

Today all targets of rustc hard-code how they link to the native C runtime. For
example the `x86_64-unknown-linux-gnu` target links to glibc dynamically,
`x86_64-unknown-linux-musl` links statically to musl, and
`x86_64-pc-windows-msvc` links dynamically to MSVCRT. There are many use cases,
however, where these decisions are not suitable. For example binaries on Alpine
Linux want to link dynamically to musl and creating portable binaries on Windows
is most easily done by linking statically to MSVCRT.

Today rustc has no mechanism for accomplishing this besides defining an entirely
new target specification and distributing a build of the standard library for
it. Because target specifications must be described by a target triple, and
target triples have preexisting conventions into which such a scheme does not
fit, we have resisted doing so.

# Detailed design
[design]: #detailed-design

This RFC introduces three separate features to the compiler and Cargo. When
combined they will enable the compiler to change whether the C standard library
is linked dynamically or statically. In isolation each feature is a natural
extension of existing features, and each should be useful on its own.

A key insight is that, for practical purposes, the object code _for the standard
library_ does not need to change based on how the C runtime is being linked;
though it is true that on Windows, it is _generally_ important to properly
manage the use of dllimport/dllexport attributes based on the linkage type, and
C code does need to be compiled with specific options based on the linkage type.
So it is technically possible to produce Rust executables and dynamic libraries
that either link to libc statically or dynamically from a single std binary by
correctly manipulating the arguments to the linker.

A second insight is that there are multiple existing, unserved use cases for
configuring features of the hardware architecture, underlying platform, or
runtime [1], which require the entire 'world', possibly including std, to be
compiled a certain way. C runtime linkage is another example of this
requirement.

[1]: https://internals.rust-lang.org/t/pre-rfc-a-vision-for-platform-architecture-configuration-specific-apis/3502

From these observations we can design a cross-platform solution spanning both
Cargo and the compiler by which Rust programs may link to either a dynamic or
static C library, using only a single std binary. As future work this RFC
discusses how the proposed scheme scheme can be extended to rebuild std
specifically for a particular C-linkage scenario, which may have minor
advantages on Windows due to issues around dllimport and dllexport; and how this
scheme naturally extends to recompiling std in the presence of modified CPU
features.

This RFC does *not* propose unifying how the C runtime is linked across
platforms (e.g. always dynamically or always statically) but instead leaves that
decision to each target, and to future work.

In summary the new mechanics are:

- Specifying C runtime linkage via `-C target-feature=+crt-static` or `-C
target-feature=-crt-static`. This extends `-C target-feature` to mean not just
"CPU feature" ala LLVM, but "feature of the Rust target". Several existing
properties of this flag, the ability to add, with `+`, _or remove_, with `-`,
the feature, as well as the automatic lowering to `cfg` values, are crucial to
later aspects of the design. This target feature will be added to targets via
a small extension to the compiler's target specification.
- Lowering `cfg` values to Cargo build script environment variables. This will
enable build scripts to understand all enabled features of a target (like
`crt-static` above) to, for example, compile C code correctly on MSVC.
- Lazy link attributes. This feature is only required by std's own copy of the
libc crate, and only because std is distributed in binary form and it may yet
be a long time before Cargo itself can rebuild std.

### Specifying dynamic/static C runtime linkage

A new `target-feature` flag will now be supported by the compiler for relevant
targets: `crt-static`. This can be enabled and disabled in the compiler via:

```
rustc -C target-feature=+crt-static ...
rustc -C target-feature=-crt-static ...
```

Currently all `target-feature` flags are passed through straight to LLVM, but
this proposes extending the meaning of `target-feature` to Rust-target-specific
features as well. Target specifications will be able to indicate what custom
target-features can be defined, and most existing targets will define a new
`crt-static` feature which is turned off by default (except for musl).

The default of `crt-static` will be different depending on the target. For
example `x86_64-unknown-linux-musl` will have it on by default, whereas
`arm-unknown-linux-musleabi` will have it turned off by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I there any particular reason why some musl targets will be statically linked by default and some not? If it's just historical then I think we should change it for consistency.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes as we've added them over time they've addressed different use cases. The plan is to not change any of them yet, as that would be a breaking change. Each target (event with the same C library) can choose whether it's static or dynamic by default.

If the crt-static option becomes more ergonomic and ubiquitous we can consider changing defaults in the future, but at this time the fact that it's a breaking change prevents us from doing so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I there any particular reason why some musl targets will be statically linked by default and some not?

The mips(el)-musl targets, for instance, can't produce statically linked binaries because static linking depends on libunwind and libunwind doesn't support mips (yet).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that's why it would make sense to use dynamic linking by default so all musl targets can be consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reiterate, we can't do that because it's a breaking change. There are many users relying on the fact that musl is a static target today. We can consider changing this all in the far future, but it is an explicitly stated non-goal of this RFC to attempt to perform any kind of unification of how the CRT is linked on various platforms.


### Lowering `cfg` values to Cargo build script environment variables
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'd like this to land in Cargo regardless of the outcome of this RFC. It helps with stuff like rust-lang/rust#35474)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed!

Copy link

@eternaleye eternaleye Aug 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe split the "lowering to build script environment variables" out to a separate RFC? I don't think there'd be much of any dissent on it, and it'd help address quite a few use cases (including conditionally linking to built files depending on test/non-test).

That could allow it to go on a relative fast-track, while keeping this RFC more focused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally rather not. I doubt fast-tracking the RFC would actually fast-track the implementation. If it's uncontroversial then we'll just have comments elsewhere :)


Cargo will begin to forward `cfg` values from the compiler into build
scripts. Currently the compiler supports `--print cfg` as a flag to print out
internal cfg directives, which Cargo uses to implement platform-specific
dependencies.

When Cargo runs a build script it already sets a [number of environment
variables][cargo-build-env], and it will now set a family of `CARGO_CFG_*`
environment variables as well. For each key printed out from `rustc --print
cfg`, Cargo will set an environment variable for the build script to learn
about.

[cargo-build-env]: http://doc.crates.io/environment-variables.html#environment-variables-cargo-sets-for-build-scripts

For example, locally `rustc --print cfg` prints:

```
target_os="linux"
target_family="unix"
target_arch="x86_64"
target_endian="little"
target_pointer_width="64"
target_env="gnu"
unix
debug_assertions
```

And with this Cargo would set the following environment variables for build
script invocations for this target.

```
export CARGO_CFG_TARGET_OS=linux
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these also be set when one calls e.g. cargo rustc -- -C target-feature=+avx? That would make these variables available for the target crate but not for its dependencies which ... might be surprising. But I guess the same already happens with cargo rustc and cfg(target_feature)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think cargo rustc is the unsafe escape hatch for those sorts of things. Anything after -- is passed verbatim to rustc right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these also be set when one calls e.g. cargo rustc -- -C target-feature=+avx?

No, currently only RUSTFLAGS affects how Cargo runs rustc initially.

export CARGO_CFG_TARGET_FAMILY=unix
export CARGO_CFG_TARGET_ARCH=x86_64
export CARGO_CFG_TARGET_ENDIAN=little
export CARGO_CFG_TARGET_POINTER_WIDTH=64
export CARGO_CFG_TARGET_ENV=gnu
export CARGO_CFG_UNIX
export CARGO_CFG_DEBUG_ASSERTIONS
```

As mentioned in the previous section, the linkage of the C standard library will
be specified as a target feature, which is lowered to a `cfg` value, thus giving
build scripts the ability to modify compilation options based on C standard
library linkage. One important complication here is that `cfg` values in Rust
may be defined multiple times, and this is the case with target features. When a
`cfg` value is defined multiple times, Cargo will create a single environment
variable with a comma-separated list of values.

So for a target with the following features enabled

```
target_feature="sse"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Cargo set the CARGO_CFG_TARGET_FEATURE variable for CPU features that are implied by the
"llvm-target" field of a target specification? For example:

Will this

cargo build --target x86_64-unknown-linux-gnu

produce this

CARGO_CFG_TARGET_FEATURE="mmx,sse,sse2,."

?

And this

RUSTFLAGS="-mmx,-sse,-sse2" cargo build --target x86_64-unknown-linux-gnu

produce this

CARGO_CFG_TARGET_FEATURE=""

?

Because, otherwise, this scheme would leave the CARGO_CFG_TARGET_FEATURE empty for both cargo build cases and that's not an accurate representation of the target CPU features.

And if Cargo will, indeed, set these implicit CPU features based on the "llvm-target", how will that
be implemented? Will Cargo call ("shell out to") a new rustc --print llvm-target-features command
created for this purpose?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Cargo call ("shell out to") a new rustc --print llvm-target-features command
created for this purpose?

Oh, I just saw the output of rustc --print cfg and seems like it contains the target features I mentioned above, at least sse and sse2 appear for x86_64-unknown-linux-gnu, so I guess this is not really an issue -- Cargo can just use that command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will Cargo set the CARGO_CFG_TARGET_FEATURE variable for CPU features that are implied by the "llvm-target" field of a target specification?

Yes

And if Cargo will, indeed, set these implicit CPU features based on the "llvm-target", how will that be implemented?

This'll be done through --print cfg. Right now the target_feature cfg is unstable so this is only available on nightly, but for this RFC we'd want to stabilize that to have everything move forward at once. If you take a look at rustup run nightly rustc --print cfg, though, you'll see target_feature in there.

Cargo already runs the compiler to learn about various things like filenames, so this'd just be another thing it'd learn about during that process.

target_feature="crt-static"
```

Cargo would convert it to the following environment variable:

```
export CARGO_CFG_TARGET_FEATURE=sse,crt-static
```

Through this method build scripts will be able to learn how the C standard
library is being linked. This is crucially important for the MSVC target where
code needs to be compiled differently depending on how the C library is linked.

This feature ends up having the added benefit of informing build scripts about
selected CPU features as well. For example once the `target_feature` `#[cfg]`
is stabilized build scripts will know whether SSE/AVX/etc are enabled features
for the C code they might be compiling.

After this change, the gcc-rs crate will be modified to check for the
`CARGO_CFG_TARGET_FEATURE` directive, and parse it into a list of enabled
features. If the `crt-static` feature is not enabled it will compile C code on
the MSVC target with `/MD`, indicating dynamic linkage. Otherwise if the value
is `static` it will compile code with `/MT`, indicating static linkage. Because
today the MSVC targets use dynamic linkage and gcc-rs compiles C code with `/MD`,
gcc-rs will remain forward and backwards compatible with existing and future
Rust MSVC toolchains until such time as the the decision is made to change the
MSVC toolchain to `+crt-static` by default.

### Lazy link attributes

The final feature that will be added to the compiler is the ability to "lazily"
interpret the linkage requirements of a native library depending on values of
`cfg` at compile time of downstream crates, not of the crate with the `#[link]`
directives. This feature is never intended to be stabilized, and is instead
targeted at being an unstable implementation detail of the `libc` crate linked
to `std` (but _not_ the stable `libc` crate deployed to crates.io).

Specifically, the `#[link]` attribute will be extended with a new argument
that it accepts, `cfg(..)`, such as:

```rust
#[link(name = "foo", cfg(bar))]
```

This `cfg` indicates to the compiler that the `#[link]` annotation only applies
if the `bar` directive is matched. This interpretation is done not during
compilation of the crate in which the `#[link]` directive appears, but during
compilation of the crate in which linking is finally performed. The compiler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work for musl? Currently libstd.rlib contains a copy of musl's libc.a. If you choose dynamic linking, the linker will receive libstd.rlib, which contains musl symbols, and also a "dynamic" -lc as arguments, won't that cause problems due to duplicate symbols? (I don't know linkers well enough to determine if this is really a problem or not)

If that's is, indeed, a problem, we could have libstd.rlib never include a copy of musl's libc.a and when rustc is asked to build a statically linked binary it would have to statically link to musl's libc.a (pass it to the linker) at that time. This implies musl's libc.a would have to be shipped with x86_64-musl's rust-std component so it's available at link time of statically linked binaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I blame this on kind=static causing the library to be bundled immediately into the current crate by rustc. We could just say kind=static isn't allowed and you have to use kind=static-nobundle instead. Then Rust would be in a similar situation to where it is on MinGW where it ships with a few libraries like libmsvcrt.a.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@japaric yes the solution for musl here will be a tricky one, but it's intended to just be an implementation detail. I'm thinking that we'll detect that the statically included library does not need to be linked (due to how #[link(..., cfg(...))] was evaluated) which will cause the compiler to create a new temporary rlib. This temporary rlib (like those we create for LTO) won't have all the original contents, just the Rust object file.

It won't be perfect, but it'll solve the precisely one use case we have for this attribute today (and is another reason why the attribute will remain unstable).

will then use this knowledge in two ways:

* When `dllimport` or `dllexport` needs to be applied, it will evaluate the
final compilation unit's `#[cfg]` directives and see if upstream `#[link]`
directives apply or not.

* When deciding what native libraries should be linked, the compiler will
evaluate whether they should be linked or not depending on the final
compilation's `#[cfg]` directives and the upstream `#[link]` directives.

### Customizing linkage to the C runtime

With the above features, the following changes will be made to select the
linkage of the C runtime at compile time for downstream crates.

First, the `libc` crate will be modified to contain blocks along the lines of:

```rust
cfg_if! {
if #[cfg(target_env = "musl")] {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: this shouldn't be limited to musl. glibc (and it's variants) have (some) support for static linking too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if support is added to get that working, we can encode that here. The specifics of that, however, are out of scope of this RFC. It's intended though that there's enough plumbing here that one could imagine:

RUSTFLAGS='-C target-feature=+crt-static' cargo build --target x86_64-unknown-linux-gnu

to one day work!

#[link(name = "c", cfg(target_feature = "crt-static"), kind = "static")]
#[link(name = "c", cfg(not(target_feature = "crt-static")))]
extern {}
} else if #[cfg(target_env = "msvc")] {
#[link(name = "msvcrt", cfg(not(target_feature = "crt-static")))]
#[link(name = "libcmt", cfg(target_feature = "crt-static"))]
extern {}
} else {
// ...
}
}
```

This informs the compiler that, for the musl target, if the CRT is statically
linked then the library named `c` is included statically in libc.rlib. If the
CRT is linked dynamically, however, then the library named `c` will be linked
dynamically. Similarly for MSVC, a static CRT implies linking to `libcmt` and a
dynamic CRT implies linking to `msvcrt` (as we do today).

Finally, an example of compiling for MSVC and linking statically to the C
runtime would look like:

```
RUSTFLAGS='-C target-feature=+crt-static' cargo build --target x86_64-pc-windows-msvc
```

and similarly, compiling for musl but linking dynamically to the C runtime would
look like:

```
RUSTFLAGS='-C target-feature=-crt-static' cargo build --target x86_64-unknown-linux-musl
```

### Future work

The features proposed here are intended to be the absolute bare bones of support
needed to configure how the C runtime is linked. A primary drawback, however, is
that it's somewhat cumbersome to select the non-default linkage of the CRT.
Similarly, however, it's cumbersome to select target CPU features which are not
the default, and these two situations are very similar. Eventually it's intended
that there's an ergonomic method for informing the compiler and Cargo of all
"compilation codegen options" over the usage of `RUSTFLAGS` today.

Furthermore, it would have arguably been a "more correct" choice for Rust to by
default statically link to the CRT on MSVC rather than dynamically. While this
would be a breaking change today due to how C components are compiled, if this
RFC is implemented it should not be a breaking change to switch the defaults in
the future, after a reasonable transition period.

The support in this RFC implies that the exact artifacts that we're shipping
will be usable for both dynamically and statically linking the CRT.
Unfortunately, however, on MSVC code is compiled differently if it's linking to
a dynamic library or not. The standard library uses very little of the MSVCRT,
so this won't be a problem in practice for now, but runs the risk of binding our
hands in the future. It's intended, though, that Cargo [will eventually support
custom-compiling the standard library][std-aware cargo]. The `crt-static`
feature would simply be another input to this logic, so Cargo would
custom-compile the standard library if it differed from the upstream artifacts,
solving this problem.

### References

- [Issue about MSVCRT static linking]
(https://github.com/rust-lang/libc/issues/290)
- [Issue about musl dynamic linking]
(https://github.com/rust-lang/rust/issues/34987)
- [Discussion on issues around glgobal codegen configuration]
(https://internals.rust-lang.org/t/pre-rfc-a-vision-for-platform-architecture-configuration-specific-apis/3502)
- [std-aware Cargo RFC]
(https://github.com/rust-lang/libc/issues/290).
A proposal to teach Cargo to build the standard library. Rebuilding of std will
likely in the future be influenced by `-C target-feature`.
- [Cargo's documentation on build-script environment variables]
(https://github.com/rust-lang/libc/issues/290)

# Drawbacks
[drawbacks]: #drawbacks

* Working with `RUSTFLAGS` can be cumbersome, but as explained above it's
planned that eventually there's a much more ergonomic configuration method for
other codegen options like `target-cpu` which would also encompass the linkage
of the CRT.

* Adding a feature which is intended to never be stable (`#[link(.., cfg(..))]`)
is somewhat unfortunate but allows sidestepping some of the more thorny
questions with how this works. The stable *semantics* will be that for some
targets the `--cfg crt_link=...` directive affects the linkage of the CRT,
which seems like a worthy goal regardless.

* The lazy semantics of `#[link(cfg(..))]` are not so obvious from the name (no
other `cfg` attribute is treated this way). But this seems a minor issue since
the feature serves one implementation-specif purpose and isn't intended for
stabilization.

# Alternatives
[alternatives]: #alternatives

* One alternative is to add entirely new targets, for example
`x86_64-pc-windows-msvc-static`. Unfortunately though we don't have a great
naming convention for this, and it also isn't extensible to other codegen
options like `target-cpu`. Additionally, adding a new target is a pretty
heavyweight solution as we'd have to start distributing new artifacts and
such.

* Another possibility would be to start storing metadata in the "target name"
along the lines of `x86_64-pc-windows-msvc+static`. This is a pretty big
design space, though, which may not play well with Cargo and build scripts, so
for now it's preferred to avoid this rabbit hole of design if possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐰


* Finally, the compiler could simply have an environment variable which
indicates the CRT linkage. This would then be read by the compiler and by
build scripts, and the compiler would have its own back channel for changing
the linkage of the C library along the lines of `#[link(.., cfg(..))]` above.

* Another approach has [been proposed recently][rfc-1684] that has
rustc define an environment variable to specify the C runtime kind.

[rfc-1684]: https://github.com/rust-lang/rfcs/pull/1684

* Instead of extending the semantics of `-C target-feature` beyond "CPU
Copy link
Member

@japaric japaric Aug 19, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I would prefer to have two namespaces for this. CARGO_CFG_TARGET_FEATURE for the existing LLVM codegen features and another one for Rust specific stuff like crt-static. It eliminates the possibility of a new (LLVM) target-feature that collides, in name, with a Rust specific feature like crt-static (assuming we'll grow more of the latter than just crt-static).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we don't forward everything vanilla to LLVM, but rather we pass it through our own whitelist. That is, we're always in complete control of all names going through to LLVM. This means that if we were to have a conflict we'd just redirect the name.

I personally like target-feature as it unifies the concept and then Cargo would only have to think about "target-feature triggers recompilation of std". Does the "we whitelist what we pass to LLVM" assuage your namespacing concern though?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned about this scenario: Today, crt-static is a Rust thing and we don't pass it to LLVM. Tomorrow, LLVM grows a CPU feature called crt-static then it becomes ambiguous what -C target-feature=+crt-static refers to. But as you mention we could disambiguate with a redirection: -C target-feature=+llvm-crt-static` is for LLVM. Not as nice as having two "namespaces" with no redirections but it works!

I personally like target-feature as it unifies the concept and then Cargo would only have to think about "target-feature triggers recompilation of std"

I haven't thought of it that way. That's a nice property of the proposed approach!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would imagine that if LLVM did indeed pick the name "crt-static" for a CPU feature we could then figure out another (perhaps more descriptive) name for the same feature.

features", we could instead add a new flag for the purpose, e.g. `-C
custom-feature`.

# Unresolved questions
[unresolved]: #unresolved-questions

* What happens during the `cfg` to environment variable conversion for values
that contain commas? It's an unusual corner case, and build scripts should not
depend on such values, but it needs to be handled sanely.

* Is it really true that lazy linking is only needed by std's libc? What about
in a world where we distribute more precompiled binaries than just std?