Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify that bool is compatible with the _Bool C type. #954

Closed
wants to merge 2 commits into from
Closed

Specify that bool is compatible with the _Bool C type. #954

wants to merge 2 commits into from

Conversation

mahkoh
Copy link
Contributor

@mahkoh mahkoh commented Mar 8, 2015

Specify that bool is compatible with the _Bool C type.

Rendered

@reem
Copy link

reem commented Mar 8, 2015

+1 from me, seems like a simple and straightforward thing for us to guarantee.


# Drawbacks

None right now. This should be a no-op on all supported platforms.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about arbitrary unsupported platforms? This proposal has the ability to box us into a suboptimal corner, by being beholden to whatever quirks C might have.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsupported platforms are not supported, hence we're not beholden to do anything on those platforms, unless I misunderstand your concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are only unsupported now. If/when we gain support for them we will have to guarantee that bool acts like _Bool on those platforms too, so we need to plan ahead.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. It seems unlikely to me that there will be any platforms where _Bool is different than our bool (especially considering that we will only ever target platforms LLVM supports, which means there is some degree of sanity), but I guess it is possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for platforms that don't use two's complement? If there is no such plan, then why would you worry more about the representation of bool than the representation of u8?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, I have no idea if _Bool is one byte on all supported platforms, so bool still may not match _Bool even with that property.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The premise of this argument is pretty ridiculous: "We know better than the hardware developers how to represent a boolean value on their hardware."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not ridiculous: the C _Bool type on some platforms may have historical constraints that don't apply to Rust. In any case, the precise representation of C types is usually chosen by the compiler, not the hardware.

A minimum improvement that would help assuage my concerns would be quoting the C standard's definition of _Bool in the RFC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, the precise representation of C types is usually chosen by the compiler, not the hardware.

That's wrong. The representation of _Bool is part of the ABI and should be (for all hardware developed after C99) defined by the hardware vendor. E.g.

http://www.x86-64.org/documentation/abi.pdf page 12
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf page 26

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That information should be in the RFC.

@retep998
Copy link
Member

retep998 commented Mar 9, 2015

Personally I think we should have a libc definition for _Bool, aliased to bool. If we encourage people to use bool directly then they have a chance of using it incorrectly such as using bool for an function that really uses BOOL which could actually be c_int.

@petrochenkov
Copy link
Contributor

A comprehensive review of sizes/alignments of C _Bool and C++ bool on various non-ancients platforms (including currently unsupported) would be helpful here.

@mahkoh
Copy link
Contributor Author

mahkoh commented Mar 9, 2015

Personally I think we should have a libc definition for _Bool, aliased to bool.

That would not be helpful. In C, bool is an alias for _Bool and _Bool is almost never used directly.

@Diggsey
Copy link
Contributor

Diggsey commented Mar 9, 2015

That would not be helpful. In C, bool is an alias for _Bool and _Bool is almost never used directly.

So name it c_bool. Seems like a perfect fit to go in libc::types::os::arch::c99.

@mahkoh
Copy link
Contributor Author

mahkoh commented Mar 9, 2015

Sure, as long as it's typedef'd to bool and not an integer type there is no problem.

@adfernandes
Copy link

👍 for adding c_bool to libc::types::os::arch::c99

FYI, the embedded world is chock-full of conflicting bool definitions since toolchains are updated at a glacial pace in industry. The specific way that bool was defined to be a "weak alias" for _Bool really solved the problem.

@petrochenkov
Copy link
Contributor

I don't see much sense in libc::c_bool. It always must alias Rust bool to be usable due to the lack of conversions, so why is it needed at all?
As a side note, libc::intN_t/libc::uintN_t have the same problem - they are absolutely unnecessary, because they always map to the same Rust types by definition.

@adfernandes
Copy link

Actually, I agree with @petrochenkov with the libc::int32_t and friends stuff since they by definition be simple aliases.

However, although rare, the int_least32_t and int_fast32_t and friends are very useful when you don't know your target native architecture width. And not just 32/64 bits either; there are a lot of 8 bit MCUs today that show no hint of dying, and the 8051 is, unfortunately, still coming up in new silicon.

Back to topic: by the c99 standard, it is not clear what underlying type maps to _Bool. See
http://stackoverflow.com/a/10630231 for more details.

@alexcrichton
Copy link
Member

We discussed this at triage today, and the general feeling was that we probably don't want to commit to the representation of Rust's bool type. We may want to apply future optimizations to the representation or semantics, and this sort of restriction may prevent us from doing so.

As you've pointed out, just using a type alias probably won't work so well here, so we will likely need a new primitive type for this purpose if we are to add it. There are a number of possible routes here (such as a newtype scalar which canonicalizes on casts), but at this time though this is believe to be a backwards compatible change and we've decided to close this as postponed for now. Thanks for the RFC regardless, though!

Postponement issue

@alexcrichton alexcrichton added the postponed RFCs that have been postponed and may be revisited at a later time. label Mar 19, 2015
@jimblandy
Copy link

Setting aside the question of whether we should have libc::c_bool, there's a separate question of fact that I've gone and done the research for:

I and others have claimed that C bool is a byte holding zero or one in all the ABIs Rust works with. @petrochenkov quite reasonably observed that it would be nice to have actual citations for this, so here is a collection of a few.

In summary, Rust and C have identical representations for bool on x86 and x86_64 on Linux, OSX, and Windows.

Again, this only bears on whether it would be correct to equate Rust's and C's bool types, not whether libc::c_bool itself should be bool, or is desirable at all.

Linux

The x86 and x86-64 ABIs that GCC uses for Linux are community-maintained forks of the original System V Application Binary Interface: Intel386 Architecture Processor Supplement, Fourth Edition. They both specify that C's _Bool is a one-byte value aligned on a one-byte boundary, holding either 0 or 1:

Booleans, when stored in a memory object, are stored as single byte objects the
value of which is always 0 (false) or 1 (true). When stored in integer registers
(except for passing as arguments), all 8 bytes of the register are significant; any
nonzero value is considered true.

The Linux Standard Base had intended to be the authority that specifies the ABI to use on Linux, but it seems to be waning; Debian dropped LSB support in September 2015. For what it's worth, the LSB fails to specify the representation of bool on x86 anyway. The LSB x86 document's “Machine Interface” section cites the SysV ABI Intel386 supplement, 4th ed for the representation of C types. However, that supplement predates C99, so it doesn't mention bool or _Bool. The LSB itself extends the supplement with descriptions of the long long type, but not bool.

Mac OSX

For the x86-64 architecture, Apple's ABI page cites what appears to be a copy of the GCC x86-64 ABI. This document contains the same language as quoted above for Linux: a _Bool is a one-byte value, aligned on a one-byte boundary, holding either 0 or 1.

Apple's page on its x86 ABI says that _Bool is a one-byte value, aligned on a one-byte boundary, but it doesn't specify any particular representation for true and false. However, if I compile the following C function using clang -O3 on OSX:

#include "stdbool.h"

int f(bool* p) {
    return *p;
}

Then I get the following x86 machine code:

_f:
    pushl   %ebp
    movl    %esp, %ebp
    movl    8(%ebp), %eax
    movzbl  (%eax), %eax
    popl    %ebp
    retl

This loads the pointer p into register %eax, fetches and zero-extends the byte to which %eax points, and then returns that value.

However, note that the C language specification requires f to return either zero or one. Since the compiler believes no steps were needed to rectify values of *p other than 0 or 1 into proper boolean values as converted to integers, it must be expecting *p to be only 0 or 1 already.

So although it's not written out, the OSX compiler behaves as if the x86 ABI places the same restrictions on the representation of bool values that x86_64 does.

Windows

The MSDN page on Visual Studio's C++ ABI says that bool is a single byte. It does not specify exactly how true and false are represented.

I asked a friend to compile the following function with MSVC, without optimization:

int thing(bool* arg) {
    return *arg;
}

This produced the following machine code:

# Function prologue.
00CF1700  push        ebp
00CF1701  mov         ebp,esp
00CF1703  sub         esp,0C0h
00CF1709  push        ebx
00CF170A  push        esi
00CF170B  push        edi
00CF170C  lea         edi,[ebp-0C0h]
00CF1712  mov         ecx,30h
00CF1717  mov         eax,0CCCCCCCCh
00CF171C  rep stos    dword ptr es:[edi]

# Code for source: *return arg
00CF171E  mov         eax,dword ptr [arg]
00CF1721  movzx       eax,byte ptr [eax]

# Function epilogue.
00CF1724  pop         edi
00CF1725  pop         esi
00CF1726  pop         ebx
00CF1727  mov         esp,ebp
00CF1729  pop         ebp
00CF172A  ret

The code for x86_64 is essentially the same.

As is the case with OSX above, the MSVC compiler feels free to assume that a value stored in a bool about which it has no other information is either zero or one. We can conclude that the values 0 and 1 are the only permitted representations for false and true.

@mahkoh
Copy link
Contributor Author

mahkoh commented Jan 7, 2016

There is no need to search for the ABIs. Simply look at the llvm source and you will find that i1 is one byte everywhere but on one odd platform where it is 4 bytes.

@jimblandy
Copy link

@mahkoh: That's not the question I was trying to answer. I wanted to answer the question:

"Is it true that Rust's in-memory representation for bool is the same as that of the C and C++ compilers we want our FFI to communicate with?" LLVM has no bearing on GCC's code generation.

@mahkoh
Copy link
Contributor Author

mahkoh commented Jan 7, 2016

Both rust and clang use i1 to represent bools in scalar context and whatever LLVM uses as the memory representation (mostly i8) for the memory representation. Hence the question is reduced to whether clang is compatible with other C compilers. This is the case.

@mahkoh
Copy link
Contributor Author

mahkoh commented Jan 7, 2016

It appears that rust unconditionally uses i8 for the memory representation of bool. Nevertheless, since it's i8 everywhere but on one unsupported platform, the result is the same.

@petrochenkov
Copy link
Contributor

There is no need to search for the ABIs. Simply look at the llvm source

+1, If GCC/Clang sources show that "1 byte/0/1" bool is emitted for all supported targets, then it's more than enough.
Manually gathering relevant pieces of ABI specs for all those ARMs/MIPSes/PowerPCs/AVRs/etc. would be an overkill.

one odd platform where it is 4 bytes

I'm curious what platforms supported by GCC or Clang don't use "1 byte/0/1" bools?
(I have limited internet acces right now and can't investigate this in detail myself.)

est31 added a commit to est31/rust that referenced this pull request Nov 22, 2017
The ABI of bool is reserved, see these links:

* rust-lang/rfcs#954 (comment)
* rust-lang#46156 (comment)
* rust-lang/rfcs#992

Currently the improper_ctypes lint is inconsistent
with that position by treating bools as if they had
a specified ABI.

To fix this inconsistency, this changes treatment of
bools by the lint.

This might break some code, so possibly it has to
be phased in slowly...
@petrochenkov petrochenkov removed the postponed RFCs that have been postponed and may be revisited at a later time. label Feb 24, 2018
@SimonSapin SimonSapin mentioned this pull request Apr 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants