Tracking issue for integer types conversion to and from byte arrays #52963

SimonSapin · 2018-08-01T22:42:03Z

A previous incarnation of this API that went though FCP in #49792, but stabilization was reverted while still in Nightly in favor of this API.

Implemented in #51919, conversions between all integer types and appropriately-sized arrays of u8, using the specified endianness:

impl $Int {
    pub fn to_ne_bytes(self) -> [u8; mem::size_of::<Self>()] {…}
    pub fn to_le_bytes(self) -> [u8; mem::size_of::<Self>()] {…}
    pub fn to_be_bytes(self) -> [u8; mem::size_of::<Self>()] {…}
    pub fn from_ne_bytes(bytes: [u8; mem::size_of::<Self>()]) -> Self {…}
    pub fn from_le_bytes(bytes: [u8; mem::size_of::<Self>()]) -> Self {…}
    pub fn from_be_bytes(bytes: [u8; mem::size_of::<Self>()]) -> Self {…}
}

ne, le, and be mean native-endian, little-endian, and big-endian respectively. Native-endian means the target platform’s endianness. Those conversions are implemented as literally transmute and nothing else (but safe). The other conversions use swap or do not swap the byte order depending on the target’s endianness.

The text was updated successfully, but these errors were encountered:

SimonSapin · 2018-08-07T18:42:10Z

I think this feature is ready for stabilization:

@rfcbot fcp merge

This is very soon after landing in Nightly in its current form, but the same functionality already went through FCP for stabilization in #49792 after a couple months in Nightly. The new API contains methods identical to the previous API (only renamed), plus new methods that implement what was previously documented as a typical usage of the previous API.

So there isn’t really anything new here, I expect this final period to be a formality except for possible last-minute naming bikeshed. (The methods landed with consistent abbreviation, another option that was considered was fully expansion with for example to_native_endian_bytes and from_big_endian_bytes.)

rfcbot · 2018-08-07T18:42:11Z

Team member @SimonSapin has proposed to merge this. The next step is review by the rest of the tagged teams:

Concerns:

~~missing_impls~~ resolved by Tracking issue for integer types conversion to and from byte arrays #52963 (comment)

Once a majority of reviewers approve (and none object), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot · 2018-08-09T17:13:56Z

🔔 This is now entering its final comment period, as per the review above. 🔔

Kimundi · 2018-08-10T23:51:53Z

@rfcbot concern missing_impls

I just noticed that we are still missing half the implementation for this.

#51919 did just add the changes for signed integers, but not for unsigned.

Compare:

https://doc.rust-lang.org/nightly/std/primitive.u8.html#method.to_bytes
https://doc.rust-lang.org/nightly/std/primitive.i8.html#method.to_ne_bytes

cc @tbu-

Same as rust-lang#51919 did for signed integers. Tracking issue: rust-lang#52963

SimonSapin · 2018-08-14T16:50:11Z

@Kimundi I belive #53358 fixes your concern. (Could you review it?)

shepmaster · 2018-08-16T13:20:58Z

ne [...] means native-endian

I keep reading it as "not equal".

SimonSapin · 2018-08-17T12:00:38Z

@shepmaster I agree that’s a real negative point. The other naming pattern that was considered (consistent full expansion) is rather verbose: from_native_endian_bytes, to_little_endian_bytes. Do you think it would still be preferable? Or perhaps some other set of names?

`{to,from}_{ne,le,be}_bytes` for unsigned integer types Same as rust-lang#51919 did for signed integers. Tracking issue: rust-lang#52963

Kimundi · 2018-08-18T11:17:27Z

@SimonSapin: Yeah, seems to be all in order now.

@rfcbot resolved missing_impls

rfcbot · 2018-08-18T11:17:31Z

🔔 This is now entering its final comment period, as per the review above. 🔔

SimonSapin · 2018-08-31T13:55:19Z

Ok, this is the second (different) misinterpretation of ne (the other being "not equal") reported in this thread, so maybe we should do something about it? As mentioned previously I think that this meaning of "host" only makes sense in the context of networking, but this APIs are not necessarily networking-related. But other than that, maybe he is not too bad.

@rust-lang/libs, any thoughts?

kjetilkjeka · 2018-08-31T15:26:03Z

Host (he) will give the wrong association if interpreted in the cross compilation context (host vs target). Making it he might therefore also be confusing. I think this was mentioned in some previous discussion?

If the name is being changed it should preferably be to something that will resolve all ambigouties, not change some ambigouties for different ones. Due to the abbriviation it is already expected that users will have to look up the function in the docs the first time they're using it. Because of this, less idiomatic alternatives might be OK. In these situations there are also arguments for the idiosyncratic alternative to be better, as it makes it difficult to assume things, forcing users to look it up in the docs.

What about:

me - machine endianness (might look to similar to ne?)
ae - architecture endianness
te - target endianness (not common to speak of target unless cross compiling as target==host otherwise)
pe - processor/platform endianness
cpue - quite descriptive but doesn't visualy look like be or ne. It might make sense that the longest alternative is the one you're least likely to use (most prone to bugs due to platform specific behavior)

SimonSapin · 2018-09-03T13:24:35Z

"Host" makes the wrong kind of sense in cross-compilation context, since this API is using the target machine’s endianness with #[cfg(not(target_endian = "…"))]

kjetilkjeka · 2018-09-03T13:56:54Z

"Host" makes the wrong kind of sense in cross-compilation context, since this API is using the target machine’s endianness with #[cfg(not(target_endian = "…"))]

Yes this was my exact concern. That's why i suggested to change it to something other than he if a change was going to happen. I see now that my comment might be interpreted the other way as well, I'll do a minor edit to fix that.

alexcrichton · 2018-09-03T17:53:24Z

If there's confusion about the short names then I'd personally err on the side of more wordy names and expand out he to host_endian or (and similar for other names)

kallisti5 · 2018-09-18T16:44:46Z

aww yis. Excited for this one. It makes some clean looking code.

    let magic_bytes = header.magic.to_ne_bytes();
    if magic_bytes != ['h', 'p', 'k', 'r'] {

CasualX · 2018-09-18T19:51:34Z

@kallisti5 Did you mean to write byte character literals instead of chars? Assuming they're meant to be byte literal characters you can go even further and do it like this ;)

    let magic_bytes = header.magic.to_ne_bytes();
    if magic_bytes != *b"hpkr" {

This is behind the `i128_blob` feature. Blobs are stored as 16 byte big-endian values, with their most significant bit flipped. This is so that sorting, comparison, etc all work properly, even with negative numbers. This also allows the representation to be stable across different computers. It's possible that the `FromSql` implementation should handle the case that the real value is stored in an integer. I didn't do this, but would be willing to make the change. I don't think we should store them this way though, since I don't think users would be able to sort/compare them sanely. Support for `u128` is not implemented, as comparison with i128 values would work strangely. This also is consistent with `u64` not being allowed, not that I think that would be reason enough on it's own. The `byteorder` crate is used if this feature is flipped, as it's quite small and implements things more or less optimally. If/when `i128::{to,from}_be_bytes` gets stabilized (rust-lang/rust#52963), we should probably use that instead.

gamozolabs · 2018-10-09T01:24:54Z

Not sure if it's too late, but is there a reason the from routines take in an array rather than a slice? Taking a random chunk of out a slice and converting it to a number would be rather clunky as it would have to be copied to an intermediate fixed size array for holding.

In terms of performance a slice version would only have a runtime cost if used with an actual slice, since it's inline the bounds check on a fixed size array would be omit.

I ran into this today and it was a bit annoying.

shepmaster · 2018-10-09T01:37:49Z

take in an array rather than a slice

By taking in an array, you are statically guaranteed that there's enough data to make the transformation — you don't have to even allow for the error case.

Taking a random chunk of out a slice and converting it to a number would be rather clunky as it would have to be copied to an intermediate fixed size array for holding.

With TryFrom / TryInto, it's not that bad:

#![feature(int_to_from_bytes, try_from)]

use std::convert::TryInto;

fn example(bytes: &[u8]) -> i32 {
    let bytes: &[_; 4] = bytes.try_into().expect("Wrong number of bytes");
    i32::from_be_bytes(*bytes)
}

gamozolabs · 2018-10-09T02:00:57Z

Yeah, I agree, but it seems that a slice is just a superset of the abilities of an array.

While I agree it's cleaner and makes more sense in a standalone implementation, I don't know of many situations where I'm dealing with something I need to convert into a number did not come from a bigger vector or slice. At least for me I don't ever see using this functionality without using a wrapper of the sorts of the snippit you posted.

I guess I'm not sure how often you are dealing with packed data and are working with fixed sized arrays as the source of this data is almost always network or file streams. The only place I work with fixed sized arrays when working with files/network is when I'm casting to a structure, and at that point I'm just going to directly cast it to the u64 and skip the middle step.

The TryInto/TryFrom implentation you mention now is exactly what I expose as a basic interface in most of my tiny applications that don't warrant my safecast library. In this case the new introduction of the from_*_bytes() interfaces provide no improvement to usability compared to what I already have as I would have to write the example wrapper you posted above, which is of the same complexity and lesser portability than my existing version (lesser portability due to relying on another experimental feature).

I have a library https://github.com/gamozolabs/safecast which allows me to safely cast or copy things with runtime checks if needed. I pretty much must use this library in all of my Rust projects as almost everything I do is working with network/file/system level packed data.

To me this just happens to be a very sensitive issue as operating on binary data from safe Rust is incredibly clunky, and is the last major issue I have personally with the language. While crates like byteorder exist I do not believe a systems language should require a third party piece of code to be able to manipulate binary data in a safe way.

No hard feelings, just trying to chime in about the usability of this API from my use cases (kernel development / fuzzer development). Many of my peers (other security researchers) who also work on developing code to parse/mutate/extract information from binary packed data have had similar gripes and ultimately a few have refused to use the language for these purposes as it's not feasible for codebases that largely operate on raw binary data.

I likely will propose an RFC along the lines of my safecast library which allows for safe copying and dynamically-checked casting of PoD types that would just be exposed as another derive for structures/enums.

-B

SimonSapin · 2018-10-09T08:35:13Z

As shepmaster suggested, what does an API taking a slice do if the input is too short? Too long? Does it panic, or ignore extra items, or do we change the return type to a Result? Taking an array avoids this question, and the concerns of how to obtain that array can be separated and left to other APIs like TryFrom.

In my opinion the main goal of this API is to replace some unsafe calls to transmute with safe method calls. It doesn’t by itself need to replace byteorder or safecast entirely, and I think it’s fine to both have this and later add some higher-level convenience APIs to the standard library.

oconnor663 · 2018-10-13T05:51:14Z

I'm not familiar with exactly what #[inline] means in the compiler, but I notice that byteorder also uses it, and yet I think I've seen cases where I still ended up paying the cost of bounds checks.

SimonSapin · 2018-10-13T07:27:31Z

#[inline] is documented at https://doc.rust-lang.org/reference/attributes.html#inline-attribute

oconnor663 · 2018-10-14T03:30:59Z

Right, I meant to say, I'm not sure what makes the compiler decide when to follow the hint and when not to. I thought I'd seen an issue with byteorder::LittleEndian::read_u64 not eliding the bounds checks when it could've, but now that I look more closely at my own code, it looks like it's actually read_u64_into (which copies a bunch of ints at once) that's somehow failing to elide. So maybe not relevant to our discussion here.

This is behind the `i128_blob` feature. Blobs are stored as 16 byte big-endian values, with their most significant bit flipped. This is so that sorting, comparison, etc all work properly, even with negative numbers. This also allows the representation to be stable across different computers. It's possible that the `FromSql` implementation should handle the case that the real value is stored in an integer. I didn't do this, but would be willing to make the change. I don't think we should store them this way though, since I don't think users would be able to sort/compare them sanely. Support for `u128` is not implemented, as comparison with i128 values would work strangely. This also is consistent with `u64` not being allowed, not that I think that would be reason enough on it's own. The `byteorder` crate is used if this feature is flipped, as it's quite small and implements things more or less optimally. If/when `i128::{to,from}_be_bytes` gets stabilized (rust-lang/rust#52963), we should probably use that instead.

Fixes rust-lang#52963

SimonSapin · 2018-11-25T07:32:19Z

FCP to merge has completed a while ago, but there has been a number of additional comments since. However I believe not enough reasons to change the API. If you disagree, now is (past) the time to say so. Here is a stabilization PR: #56207

nagisa · 2018-11-25T14:43:40Z

As shepmaster suggested, what does an API taking a slice do if the input is too short? Too long? Does it panic, or ignore extra items, or do we change the return type to a Result? Taking an array avoids this question, and the concerns of how to obtain that array can be separated and left to other APIs like TryFrom.

Note, that this was the same with the API for char::encode_utf8 (which was char::encode_utf8(&mut [u8; 4]) -> usize, which was later changed to its current char::encode_utf8(&mut [u8]) -> &str (panicking if the buffer is not large enough) for convenience reasons.

Not that I’ve have any preference either way.

Stabilize the int_to_from_bytes feature Fixes rust-lang#52963 FCP to merge completed: rust-lang#52963 (comment)

SimonSapin · 2018-11-25T16:37:27Z

#56216 will hopefully make it easier to use these APIs with slices.

SimonSapin added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. labels Aug 1, 2018

This was referenced Aug 1, 2018

Tracking issue for the to_bytes and from_bytes methods of integers #49792

Closed

Provide {to,from}_{ne,le,be}_bytes functions on integers #51919

Merged

SimonSapin added the B-unstable Blocker: Implemented in the nightly compiler and unstable. label Aug 7, 2018

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Aug 7, 2018

rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Aug 9, 2018

SimonSapin added a commit to SimonSapin/rust that referenced this issue Aug 14, 2018

{to,from}_{ne,le,be}_bytes for unsigned integer types

8defd15

Same as rust-lang#51919 did for signed integers. Tracking issue: rust-lang#52963

SimonSapin mentioned this issue Aug 14, 2018

{to,from}_{ne,le,be}_bytes for unsigned integer types #53358

Merged

rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Aug 18, 2018

rsdy pushed a commit to redsift/kdb-rs-hash that referenced this issue Aug 20, 2018

Track latest nightly (rust-lang/rust#52963)

778db1b

yrp604 mentioned this issue Sep 6, 2018

Integer types ‘to_bytes’ and ‘from_bytes’ docs outdated #54001

Closed

willscott mentioned this issue Sep 23, 2018

Updates for new rustc oasislabs/oasis-ethwasi-runtime#198

Closed

oconnor663 mentioned this issue Oct 1, 2018

remove the dependency on byteorder oconnor663/blake2_simd#5

Closed

thomcc mentioned this issue Oct 8, 2018

Add a feature for storing i128 as blobs. rusqlite/rusqlite#411

Merged

SimonSapin added a commit to SimonSapin/rust that referenced this issue Nov 25, 2018

Stabilize the int_to_from_bytes feature

68a26ec

Fixes rust-lang#52963

SimonSapin mentioned this issue Nov 25, 2018

Stabilize the int_to_from_bytes feature #56207

Merged

pietroalbini added a commit to pietroalbini/rust that referenced this issue Nov 25, 2018

Rollup merge of rust-lang#56207 - SimonSapin:int_to_from_bytes, r=nagisa

d21d510

Stabilize the int_to_from_bytes feature Fixes rust-lang#52963 FCP to merge completed: rust-lang#52963 (comment)

bors closed this as completed in #56207 Nov 25, 2018

alexcrichton mentioned this issue Dec 19, 2018

mismatched types with from_le_bytes #56902

Closed

jplatte mentioned this issue May 28, 2020

Try to reduce the dependency graph Smithay/wayland-rs#321

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking issue for integer types conversion to and from byte arrays #52963

Tracking issue for integer types conversion to and from byte arrays #52963

SimonSapin commented Aug 1, 2018

SimonSapin commented Aug 7, 2018

rfcbot commented Aug 7, 2018 •

edited

Loading

rfcbot commented Aug 9, 2018

Kimundi commented Aug 10, 2018

SimonSapin commented Aug 14, 2018

shepmaster commented Aug 16, 2018

SimonSapin commented Aug 17, 2018

Kimundi commented Aug 18, 2018

rfcbot commented Aug 18, 2018

SimonSapin commented Aug 31, 2018

kjetilkjeka commented Aug 31, 2018 •

edited

Loading

SimonSapin commented Sep 3, 2018

kjetilkjeka commented Sep 3, 2018

alexcrichton commented Sep 3, 2018

kallisti5 commented Sep 18, 2018

CasualX commented Sep 18, 2018

gamozolabs commented Oct 9, 2018

shepmaster commented Oct 9, 2018 •

edited

Loading

gamozolabs commented Oct 9, 2018

SimonSapin commented Oct 9, 2018

oconnor663 commented Oct 13, 2018

SimonSapin commented Oct 13, 2018

oconnor663 commented Oct 14, 2018

SimonSapin commented Nov 25, 2018

nagisa commented Nov 25, 2018

SimonSapin commented Nov 25, 2018

Tracking issue for integer types conversion to and from byte arrays #52963

Tracking issue for integer types conversion to and from byte arrays #52963

Comments

SimonSapin commented Aug 1, 2018

SimonSapin commented Aug 7, 2018

rfcbot commented Aug 7, 2018 • edited Loading

rfcbot commented Aug 9, 2018

Kimundi commented Aug 10, 2018

SimonSapin commented Aug 14, 2018

shepmaster commented Aug 16, 2018

SimonSapin commented Aug 17, 2018

Kimundi commented Aug 18, 2018

rfcbot commented Aug 18, 2018

SimonSapin commented Aug 31, 2018

kjetilkjeka commented Aug 31, 2018 • edited Loading

SimonSapin commented Sep 3, 2018

kjetilkjeka commented Sep 3, 2018

alexcrichton commented Sep 3, 2018

kallisti5 commented Sep 18, 2018

CasualX commented Sep 18, 2018

gamozolabs commented Oct 9, 2018

shepmaster commented Oct 9, 2018 • edited Loading

gamozolabs commented Oct 9, 2018

SimonSapin commented Oct 9, 2018

oconnor663 commented Oct 13, 2018

SimonSapin commented Oct 13, 2018

oconnor663 commented Oct 14, 2018

SimonSapin commented Nov 25, 2018

nagisa commented Nov 25, 2018

SimonSapin commented Nov 25, 2018

rfcbot commented Aug 7, 2018 •

edited

Loading

kjetilkjeka commented Aug 31, 2018 •

edited

Loading

shepmaster commented Oct 9, 2018 •

edited

Loading