-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bitfields support #3113
base: master
Are you sure you want to change the base?
Add bitfields support #3113
Conversation
Let's also take a look at other designs from other languages, like Ada, etc. |
I think we should instead do something like the following: #[repr(C)]
struct MyStruct {
#[repr(bitfield(u32))]
a: uint<5>,
#[repr(bitfield(u32))]
b: uint<3>,
#[repr(bitfield(new, u32))]
c: uint<12>,
#[repr(bitfield(u16, new))]
d: u16,
e: bool,
#[repr(bitfield(bool))]
f: bool,
#[repr(bitfield(bool))]
g: bool,
} which is equivalent to the C struct: struct MyStruct {
uint32_t a:5;
uint32_t b:3;
uint32_t :0, c:12;
uint16_t d:16, :0;
_Bool e, f:1, g:1;
} This representation gives the advantage that Rust fields have the actual type (e.g. I'd expect Rust to get generic integers ( |
"needed" is perhaps an overstatement of the SIMD situation. As to the RFC:
|
Bitfields are interesting on their own, even without honoring the C layout. They can be used to pack information more densely: for instance, the bitflags crate replacing a struct of
|
If you don't need to honor the C layout rules you can do the same effect as bitfields very simply with just a macro_rules or two, and you get a lot more control over it than any set of language rules that would have to fit all situations for all people across the entire language. So I honestly don't think we need non-repr-C bitfields. |
I would say this is more likely for a non-safe language to say. If we actually add support for this then we will need to add syntax like
What about transparent tuple structs? We could restrict transparent structs to non-bitfield structs to make this feature happen. |
This is a fairly simple static verification step. The compiler can trivially determine if a given enum has all possible bit patterns of a given bit mask inhabited, and then either allow without bounds check or compile error.
Why would this be treated any differently at all from curly brace structs? |
For some prior art, I've always loved how C# does this: [StructLayout(LayoutKind.Explicit, Size = 4)]
struct Foo {
[FieldOffset(0)]
public byte bar;
[FieldOffset(0)]
public int baz;
} To guess how we might adapt that to Rust: #[repr(explicit(size = 4))]
struct Foo {
#[repr(offset = 0)]
bar: u8,
#[repr(offset = 0)]
baz: u32,
}
#[repr(explicit(size = 1))]
struct Flags {
#[repr(offset = 0, size = 1)]
one: bool,
#[repr(offset = 1, size = 1)]
two: bool,
#[repr(offset = 5, size = 1)]
three: bool,
} To summarize, Add I do think bit-n integers would fit nicely here but I think that's orthogonal and shouldn't be part of this RFC. |
When a field annotated with `bits(N)` is read, the value has the type | ||
of the field and the behavior is as follows: | ||
|
||
- The `N` bits of storage occupied by the bit-field are read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct as written. If N is 7 as then we must read (at least 8 bits).
I think it would be better to speak about what won't be read when reading from a bit field.
- If overflow checks are enabled and the value is outside the range of values | ||
that can be read from the field, the overflow check fails. | ||
- The bitmask `(1 << N) - 1` is applied to the value and the remaining `N` | ||
significant bits are written to the storage of the bit-field. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might also include a read. That should be mentioned.
Should the prior art section also cover the bitfield crate? It is quite nice in my experience. I think it would probably also be good for the RFC to explain why it needs to be in the language vs in a crate. |
I think the bitfield crate, and the fact that you can even make your own alternative if you don't like how it handles things, is proof enough that we don't need language support for repr(rust) bitfields. However, lang support for repr(C) bitfields brings in a high level of confidence that the compiler will correctly match the layout of the local C ABI when compiling for a target. So to me that's the valuable thing to focus on here. |
Macro-generated field emulation (via explicit getter/setter methods) is clunky in comparison to an actual field. Nothing in Stable and Nightly will fix this: there is no DerefMove or DerefSet to simulate properties; there is no replacement for struct construction syntax and patterns. Property-like const fns would be a great addition to the language, for sure. For layout concerns C bitfields are a nightmare. They're tacky and platform dependent. This RFC shouldn't spend time making them ultra-ergonomic to write. Instead, I think this RFC should call out that:
|
Well thats normal as its C xD. Here's list of issues the linux kernel experienced from bitfields betrayed by GCC https://lwn.net/Articles/478657/. We do not want that to happen in rust world :D thats why this ultra-ergonomic write is useful. |
Why is this need best solved by this instead of DerefMove or DerefSet, then? |
The Linux kernel user-space API contains over 400 bit-fields. Writing the | ||
corresponding types in Rust poses significant problems because the layout of | ||
structs that contain bit-fields varies between architectures and is hard to | ||
calculate manually. Consider the following examples which were compiled with | ||
Clang for the `X-unknown-linux-gnu` target: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This brings up a question I have been meaning to ask: aren't C bitfields highly compiler-defined? What parts of bitfield allocation are defined by the implementation of the compiler, and why can we trust the bitfields designed for C interop actually match the implementation for that compiler? I do not believe this is a trivial implementation question we can shrug off for later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Off the top of my head I have seen compilers change the following between implementations (at least where important for interoperability). Some implementations even allow these to be configured either through compiler flags or #pragma
s.
- default bit order (MSB vs LSB)
- what integer types are allowed (by default only
int
andunsigned int
are allowed) - bit packing (how the fields as defined actually get squished into the bytes)
- if fields can straddle storage-unit boundaries
- e.g. with the following bitfield, will the size of the structure be two bytes or three? Assuming LSB allocation order, will
b
include bits6..=9
or8..=11
?
struct example_t { unsigned char a: 6; unsigned char b: 4; unsigned char c: 6; }
- e.g. with the following bitfield, will the size of the structure be two bytes or three? Assuming LSB allocation order, will
Based on all of this I think if Rust is going to support bitfields natively, we need to select one set of features and stick to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, that's... somewhat stressful to think about. It sounds like we may have trouble actually finding a rule to adhere to for the Rust compiler that produces a conformant implementation. I found the following in C11, 6.7.2.1:
- An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
- A bit-field declaration with no declarator, but only a colon and a width, indicates an unnamed bit-field.) As a special case, a bit-field structure member with a width of 0 indicates that no further bit-field is to be packed into the unit in which the previous bit-field, if any, was placed.
The System V AMD64 ABI adds:
- bit-fields are allocated from right to left
- bit-fields may share a storage unit with other struct / union member
- Unnamed bit-fields’ types do not affect the alignment of a structure or union.
And I have taken a selection of things from the Arm64 AAPCS here (it includes a reasonably complete layout calculation algorithm, which I have not quoted in full):
- For each bit-field, the type of its container is:
- Its declared type if its size is no larger than the size of its declared type.
- The largest integral type no larger than its size if its size is larger than the size of its declared type (see Over-sized bit-fields).
- The container type contributes to the alignment of the containing aggregate in the same way a plain (not bit-field) member of that type would, without exception for zero-sized or anonymous bit-fields.
- The content of each bit-field is contained by exactly one instance of its container type.
- For big-endian data types K(F) is the offset from the most significant bit of the container to the most significant bit of the bit-field.
- For little-endian data types K(F) is the offset from the least significant bit of the container to the least significant bit of the bit-field.
- The AAPCS does not allow exported interfaces to contain packed structures or bit-fields.
The last one is kind of funny because it makes me wonder if that means we should compile error if someone tries to yield a struct with a bitfield to extern "C"
on Arm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently the issue, as explained to me on Zulip, is actually that "packed structures" or "packed bit-fields" may not be used through "exported interfaces". Here an "exported interface" is anything that gets exposed to other programs via symbols, etc. in the usual manner and thus may have a call site that is not directly controlled by the compiler in question. Importantly, the bit-fields in question for ordinary #[repr(C)]
data should be fine.
For those procedures exclusively governed by a single compiler, of course, the compiler can ignore the entire AAPCS anyways, since it is the boundaries between programs (or libraries, etc.) that are what the AAPCS is designed to govern.
I have been evaluating the landscape of C compilers and have become much more familiar with the standard. The C23 standard is going to land without enormous improvements to the handling of bitfields per se. There will be some improvements to ability to specify some of various sizes, which will give improved programmer control in new declarations if accepted, it will not change existing bitfields in the world of C code. That means that often, bitfields will remain, essentially, implementation defined. So it is important for this RFC to reflect on what, exactly, it means to be "C-compatible", when "C" does not have one definition, nor even 5 (C89, C99, C11, C17, and C23), but one for every single C compiler and for every single target, combinatorically. Often, Even when we factor in such things as processor-specific ABIs, often the layout of bitfields is ambiguous at best. That creates a unique problem when this much is left open for implementation definition: The implementer can change their mind. And that could undermine the kind of stability guarantees that programmers expect from Rust. In the past, vendor actions have significantly impacted Rust's platform support. However, because they affected OS-specific interfaces, or whether a target existed, or something wrapped in abstraction barriers, this hasn't mattered much to the core of the Rust language. But this RFC, if accepted, could make those changes cut much deeper. The details of how In a future where Standard C does specify, exactly, how bitfields should be handled, even just enough that we could believe that compilers would at least reliably come to similar conclusions, this RFC would seem useful. Until then, it seems more appropriate to address the C bitfields problem by providing tools that make it easier to solve this in libraries. |
Given that context, I think it’s important that features can be (and are) explained in a way that doesn’t require deep familiarity with other programming languages / with C. I know some basics of C, but I don’t know anything about “bit-fields” in particular. I am deeply familiar with Rust. With this background, this RFC ready very weird for me, basically I understand almost nothing from the RFC text alone as long as I haven’t read through the complete “reference-level explanation” in detail yet. But IMO, it certainly shouldn’t be the case that someone deeply familiar with Rust will understand nothing at all from the motivation and guide-level explanations of a RFC alone. Here’s the limited information that I personally got as an understanding / takeaway from those sections, so you know what you can improve upon. From the motivation: The C language has something called “bit-fields”. The Linux kernel uses them. They are hard to understand/calculate/whatever. They have a peculiar syntax using a colon that I’ve never seen before, and they have surprising/weird platform-dependent effects that I cannot even begin to understand without the slightest hint of where this is coming from. So far this reads like a horribly confusing feature that I wouldn’t want to have in Rust at all if there’s any chance to avoid it and get the necessary interop in a different way. If you want this motivation to give off a different vibe than “wtf is this weirdness I don’t understand it and don’t want it”, then perhaps the motivation should not only motivate “bitfields exist and are hard” but also give some indication why they’re a useful (and thus usef) feature of the C language, what kind of feature they are, the most basic intuition what a “bit-field” even is. From the guide-level explanation: The RFC proposes some attribute-based syntax that’s supposed to be an equivalent to the C syntax. As to what the syntax means, I shall better learn some C, I guess? From skimming through the reference-level explanation: There’s syntax of course, good, I can skip that, since the fact that there’s a new syntax is the only thing that I did understand in the RFC. Theres some nomenclature and restrictions... alright, restrictions don’t give me much in terms of what a bit-field is in the first place. Writing this reply as I’m reading more of that section... finally, this is the first time I come across the most crucial piece of information. This should be among the first sentences of the RFC, but instead it’s well hidden in the middle the “reference-level” section.
On that note, the reference-level explanation should probably get some structure that separates the different sections about syntax, restrictions of what types and values can be used, semantics of interoperating with the fields, layout, and possibly more. Finally, a single note from me about the contents, not the presentation, and I suppose this has been mentioned in the discussion above already, too. I’m perplexed by the premise that
As I mentioned in the beginning of this post, the dual-role of The fact that the bitfields feature introduces a full language feature that's usable without |
The current bitfield situation is a potential blocker to using Rust in embedded applications at some organizations cough my employer cough as it makes writing safe, standard, error-free code difficult. Nearly all C compilers (GCC, Clang, IAR, etc.) support marking bitfields as "Packed" via a Obviously, we can write macros or functions and a pile of setters and getters and bit math headaches to accomplish this anyway - it's all just bytes at the end of the day. But saying we only need the As a bonus, supporting only the not-platform-dependent use case would make it easier for something like bindgen to make something more idiomatic. See https://rust-lang.github.io/rust-bindgen/using-bitfields.html - if you actually run this you'll see it makes quite the mess. - EDIT: It's also already a platform-dependent mess, as bindgen needs to know the target get the padding right: https://rust-lang.github.io/rust-bindgen/faq.html#how-to-generate-bindings-for-a-custom-target As is, every option leaves a bit to be desired.
So while I'm in support of taking time to get this right, saying "We can leave it up to crates" doesn't seem good enough. |
from looking at the cargo features and the (edit: nevermind, it documents |
I am distinctly not an experienced Rust dev, but if I had to throw my hat in the ring to recommend syntax, it would probably be something like this: #[endian(little)]
struct Foo {
a : bool,
b : u7,
#[endian(big)]
d : u24,
e : [i16; 4],
f : [u24; 3],
} Making this dependent on RFC: Generic integers #2581 Where I think things get a bit gross with this is generics and enums. It might be the case that this makes it hard to verify something is actually %8 bits in size, which should probably be enforced (though that's a tradeoff with composability of structs). Some way to specify how many bits an enum should take would be logical, along with a reasonable solution for dealing with being OOB of that enum. There's also the fun case of how to handle |
As the owner of the |
For I/O I think it's not unreasonable to do it via bit shifts and what not. When you've got 100+ different packet types for shooting over a network each of different sizes (which may be many, many bytes large) needing to think to construct and destruct them can get quite tedious, and I don't know of a better way to handle it than C bitfields. Again, I'm far from a Rust pro, but when a not insignificant amount of the application logic is processing and handling these packets, it needs to be ergonomic to do work with their data. |
If you make accessor methods for each bitpacked "field" you want to simulate, then there's a pretty clear conversion that's easy to remember:
Since bitpacked values aren't really held inside other bitpacked values, this simple rule is enough to handle almost any situation. Even if the overall struct for a situation contains two different bitpacked values, just treat each bitpacked value individually and the problem generally remains manageable. |
This replaces the "arcane C rules for formatting bit fields" with entirely compiler-specific, non-standard formatting, with no compatibility guarantees. Some platform ABIs go so far as to note that this is hypothetically possible, but it should never be exposed in a public header, ever, and that any such code that does so is nonconforming... right after noting an implementation-defined difference in generated layouts between two C compilers when you do this. So I disagree with your conclusion:
...because at least if you use a crate, you have an actual guarantee that you have the same thing on both ends of the wire, as both compilers have to compile Rust correctly. This is the same, basically, as using a C library that does bit-munging exclusively with Anyways, if deku using alloc is bad, consider using its dependencies like bitvec more directly (specifically, BitSlice). It is very common in |
...Now, aside from the note that I really hope you aren't trading data between any copies of @VegaDeftwing In general, because Rust crates have access to procedural macros, which allow for writing significant syntax extension for the language, when we say "we should let crates handle this", it does not necessarily mean modifying the language is inconceivable. It means that it's currently believed that a library can provide a better API, even a better syntax, without having to PR their changes to the compiler, which allows them to iterate independently. This is not true for all libraries, as not all code can be generated by simply having rustc If your corporation needs a better bit-munging library than currently exists, an obvious route suggests itself: contracting a Rust pro for such and worrying about whether the library is suitable for PRing to rustc later. |
The lack of bitfield compatibility of Rust The paper claims this pattern is common in the linux kernel and the lack of a real ABI/FFI compatible solution leads to some non-trivial & measurable overhead when integrating Rust into the Linux Kernel. There are other challenges, this is just 1 of the key 3 the authors highlighted. |
This RFC adds support for bit-fields in
repr(C)
structs bybits(N)
that can be applied to integerfields.
Example:
This pull is reopens the pull #3064 as mahkoh is no longer participating in the Rust community.
Issue: #314
Rendered preview