Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

From<u32> for Ipv4Addr is not performed in network byte order #48819

Closed
VilleHallivuori opened this issue Mar 7, 2018 · 10 comments
Closed

From<u32> for Ipv4Addr is not performed in network byte order #48819

VilleHallivuori opened this issue Mar 7, 2018 · 10 comments
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@VilleHallivuori
Copy link

VilleHallivuori commented Mar 7, 2018

Documentation for u32 -> Ipv4Addr conversion (https://doc.rust-lang.org/1.21.0/std/net/struct.Ipv4Addr.html) and recent bug discussion (#40118) claim that conversion is performed in network byte order. This is clearly not the cases.

For example IP address 10.11.12.13 in IP header would be hex bytes 0x0a 0x0b 0x0c 0x0d and Inter CPU (little endian) would interpret this value as u32 0x0d0c0b0a. Conversion to Ipv4Addr results with current library implementation IP address 13.12.11.10.

The problem is clear from the fact that implementation (https://doc.rust-lang.org/1.21.0/src/std/net/ip.rs.html#745-747) does not contain conditional code based on endianness and operates using shift operations. Hence it can only work on a processor that is natively in network byte order. Current implementation is:

#[stable(feature = "ip_u32", since = "1.1.0")]
impl From<u32> for Ipv4Addr {
    /// It performs the conversion in network order (big-endian).
    fn from(ip: u32) -> Ipv4Addr {
        Ipv4Addr::new((ip >> 24) as u8, (ip >> 16) as u8, (ip >> 8) as u8, ip as u8)
    }
}

Conversion without conditional code would only be possible if code would cast u32 to u8[4] (not sure if that is possible in Rust... on C that would work), not using arithmetic operations on u32 where host byte order comes into play.

A suitable fix might be along the lines of (I am assuming Rust compiler optimises htonl away.. if not, more optimal approach would just be to make conversion trait itself conditional on host byte order):

#[cfg(target_endian = "little")]
fn htonl(val : u32) -> u32 {
    let o3 = (val >> 24) as u8;
    let o2 = (val >> 16) as u8;
    let o1 = (val >> 8)  as u8;
    let o0 =  val        as u8;
    ((o0 as u32) << 24 | (o1 as u32) << 16 | (o2 as u32) << 8 | (o3 as u32))
}

#[cfg(target_endian = "big")]
fn htonl(val : u32) -> u32 {
    val
}

#[cfg(target_endian = "little")]
#[test]
fn test_htonl() {
    assert_eq!(0x12345678, htonl(0x78563412));
}

impl From<u32> for Ipv4Addr {
    /// It performs the conversion in network order (big-endian).
    fn from(ip_hb: u32) -> Ipv4Addr {
        let ip = htonl(ip_hb);
        Ipv4Addr::new((ip >> 24) as u8, (ip >> 16) as u8, (ip >> 8) as u8, ip as u8)
    }
}

Fixing the issue in code (instead of just fixing the documentation) does have down side. If any code writes IP addresses in hex in host byte order (that is a convenient way to do it), fixing this breaks such code.

@Centril Centril added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-bug Category: This is a bug. labels Mar 7, 2018
@CodesInChaos
Copy link

CodesInChaos commented Mar 8, 2018

IMO 0x0d0c0b0a should map to 13.12.11.10 regardless of host endiannes, i.e. I consider the current implementation correct.

My understanding of network order / big endian conversion is that it behaves exactly like that, while what you propose would be called host/native order. But perhaps the documentation could be written more clearly so it's easier to understand that its the representation of Ipv4Addr that's network-endian and not the u32 you convert from.

@VilleHallivuori
Copy link
Author

Perhaps it is best to see how Linux C libraries interpret network byte order. Typically IPv4 addresses are stored as u32 and are always in network byte order. Same is true for UDP and TCP port numbers. Lets experiment with a simple C program:
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdio.h>

int main(void) {
struct in_addr ia;

inet_aton("1.2.3.4", &ia);
printf("ia=0x%x\n", ia.s_addr);

ia.s_addr=0x0d0c0b0a;
printf("str for 0x%x is %s\n",
ia.s_addr, inet_ntoa(ia));

return 0;
}

Unsurprisingly this works per network byte order interpretation I have described:
~/tmp:$ ./test_ipa
ia=0x4030201
str for 0xd0c0b0a is 10.11.12.13

Notice how on Intel based server bytes are swaped. Let's look manual page to see what these functions are supposed to do:
man inet_aton
int inet_aton(const char *cp, struct in_addr *inp);

   inet_aton() converts the Internet host address cp from the IPv4 numbers-and-dots notation into  binary  form
   (in  network byte order) and stores it in the structure that inp points to.  inet_aton() returns non-zero if
   the address is valid, zero if not.  The address supplied in cp can have one of the following forms:


The structure in_addr as used in inet_ntoa(), inet_makeaddr(), inet_lnaof() and inet_netof() is defined in
<netinet/in.h> as:

       typedef uint32_t in_addr_t;

       struct in_addr {
           in_addr_t s_addr;
       };

Notice manual page specifying that network byte order is used.

If you use host byte order for IP addresses (or port numbers) on Intel based Linux machine, it will fail miserably.

I encountered this problem in Rust library while trying to integrate a Rust based software to software that stores network byte order IP addresses in u32s. So I am quite convinced that conversion is incorrect and the evidence the above test program shows clearly indicates that at minimum your interpretation of network byte order for u32 is different from what is commonly accepted.

You may also want to check how IP packets looks when captured with Wireshark. IP addresses shown in hex dump are in network byte order (if you interpret four octets as u32). Notice that bytes are not in the order Intel processor would store them in host byte order.

@CodesInChaos
Copy link

CodesInChaos commented Mar 8, 2018

The mindset to use is that u32 is just an abstract integer between 0 and 2^32 with no specific in-memory representation (we all know that it has one, but as much as possible the code should be oblivious to it). When you want to write it to the wire, you convert it to a [u8; 4], which is best done using platform independent code, i.e. bitshifts. For IP addresses this [u8; 4] representation must be big-endian/network order, since that's what the IP spec says.

This means that the same integer always maps to the same bytes, regardless of your host endianness, making it easy to write platform independent code (e.g. Ipv4Addr::from(1).octets() is always [0x00, 0x00, 0x00, 0x01].)

Unfortunately the C programmers of old did not follow that mindset, re-interpret casting between structs and on-wire representations with abandon, producing some ugly APIs in the process. So if you have to interact with such a badly designed API, you need to produce integers that can be re-interpreted to a specific representation. This can be done with 123.to_be() and 123.from_be(). This should be done as close as possible to the C API, so as little code as possible depends on those platform dependent integers.


One interesting question is if Rust guarantees that an Ipv4Addr is ABI compatible with a network order 32-bit integer. It uses that representation internally, but I couldn't find any documentation guaranteeing that. If it does, you could simply use Ipv4Addr in the functions used for interop, avoiding manual calls to to_be() altogether.

@VilleHallivuori
Copy link
Author

It is perfectly good specification to do the conversion from host byte order. I would however suggest fixing the documentation to indicate that the conversion is performed in host byte order. Current documentation is highly misleading, especially in light of established practices (i.e. sockets) on what u32 network byte order presentation of IPv4 address is.

Use of [u8; 4] is indeed unambiguous, but much of the C-based IPv4 handling code does rely on u32:s with network byte order (probably due to fact that that is how kernel and C library handles them).

I do agree that on Rust code proper typing should be done, and indeed the conversion to Ipv4Addr is the first thing Rust code does for inputs. Hence this bug report. I don't see how I could pass Ipv4Addr conveniently in FFI however.

@CodesInChaos
Copy link

CodesInChaos commented Mar 8, 2018

The specification is written with the intended meaning of "This converts an integer to its network representation", but it's a bit ambiguous since it can also be interpreted as "This interprets the integer as being in network representation".


I don't see how I could pass Ipv4Addr conveniently in FFI however.

If Ipv4Addr is guaranteed ABI compatible, you could simply use Ipv4Addr in the FFI definition, where the C side uses u32. But I'm not sure if it actually is a guarantee or if the current implementation merely happens to use a network order u32 as its internal representation. (The upcoming repr(transparent) feature might be required for that)

As long as there is no guarantee, you'll have to resort to to_be()/from_be().


In How to teach endian Robert Graham describes pretty much the same approach.

The part most relevant to our discussion here is:

The API for network programming is "sockets". In some cases, you have to use the ntohs() family of macros. For example, when binding to a port, you execute code like the following:

sin.sin_port = htons(port);

You do this not because the API defines it this way, not because you are parsing data.

Some programmers make the mistake of keeping the byte-swapped versions of IP addresses and port numbers throughout their code. This is wrong. Their code should keep these in the correct format, and only passed through these byte-swapping macros on the Interface to the sockets layer.

@kennytm kennytm added the A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools label Mar 10, 2018
@frewsxcv
Copy link
Member

So seems we can't change the underling behavior here since that could be a very subtle breaking change. Regarding changing the documentation here, what would be the best way to word this? Something like this?

before: It performs the conversion in network order (big-endian).

after: Convert an Ipv4Addr into a u32 in host byte order.

note to self, here's a WIP branch: https://github.com/rust-lang/rust/compare/master...frewsxcv:frewsxcv-network-order?expand=1

@VilleHallivuori
Copy link
Author

Documentation change seems the easiest avenue. For u32->Ipv4Addr conversion, documentation would of course have to be:
Convert a host byte order u32 into Ipv4Addr.
Your proposal "Convert an Ipv4Addr into a u32 in host byte order." does cover the reverse conversion.

It might be a good idea to add a warning note along the lines of: As conversion is performed in host byte order, this primitive must not be used to convert in_addr (in_addr.s_addr) to IP address.

A suitable doc-test might be a good clarification also, although I am not sure how to come up with a short elegant test. A long version would be:
#[cfg(target_endian = "little")]
#[test]
fn test_ipv4_addr_u32_conversions_le() {
let host_byte_order_u32 : u32 = 0x01020304;
let network_byte_order_u32 = host_byte_order_u32.to_be();
assert_eq!(std::net::Ipv4Addr::new(1, 2, 3, 4), std::net::Ipv4Addr::from(host_byte_order_u32));
assert_eq!(std::net::Ipv4Addr::new(4, 3, 2, 1), std::net::Ipv4Addr::from(network_byte_order_u32));
}

#[cfg(target_endian = "big")]
#[test]
fn test_ipv4_addr_u32_conversions_be() {
let host_byte_order_u32 : u32 = 0x01020304;
let network_byte_order_u32 = host_byte_order_u32.to_be();
assert_eq!(host_byte_order_u32, network_byte_order_u32);
assert_eq!(std::net::Ipv4Addr::new(1, 2, 3, 4), std::net::Ipv4Addr::from(host_byte_order_u32));
assert_eq!(std::net::Ipv4Addr::new(1, 2, 3, 4), std::net::Ipv4Addr::from(network_byte_order_u32));
}

@frewsxcv
Copy link
Member

opened a pr #49418

frewsxcv added a commit to frewsxcv/rust that referenced this issue Mar 28, 2018
@CodesInChaos
Copy link

I'm not too happy with the tests. The test should make it clear that any code which simply uses Ipv4Addr and does no interop and doesn't use obviously platform dependent functions like to_be/from_be will function the same, regardless of host endianness.

Something like:

#![feature(ip_constructors)]
use std::net::Ipv4Addr;

fn test_ipv4_addr_portable_conversions() {
    assert_eq!(Ipv4Addr::localhost(), Ipv4Addr::new(127, 0, 0, 1));
    assert_eq!(Ipv4Addr::localhost(), Ipv4Addr::from(0x7F000001));
    assert_eq!(Ipv4Addr::localhost(), Ipv4Addr::from([127u8, 0u8, 0u8, 1u8]));
    assert_eq!(Ipv4Addr::localhost(), "127.0.0.1".parse::<Ipv4Addr>().unwrap());

    assert_eq!(u32::from(Ipv4Addr::localhost()), 0x7F000001);
    assert_eq!(Ipv4Addr::localhost().octets(), [127u8, 0u8, 0u8, 1u8]);
    assert_eq!(Ipv4Addr::localhost().to_string(), "127.0.0.1");
}

fn test_ipv4_network_endian_transmutate_to_bytes() {
    let host_byte_order_u32 : u32 = u32::from(Ipv4Addr::localhost());
    assert_eq!(host_byte_order_u32, 0x7F000001);
    let network_byte_order_u32: u32 = host_byte_order_u32.to_be();
    assert_eq!(network_byte_order_u32, if cfg!(target_endian = "big") { 0x7F000001 } else { 0x0100007F });
    let bytes : [u8; 4] = unsafe { std::mem::transmute(network_byte_order_u32) };
    assert_eq!(bytes, [127u8, 0u8, 0u8, 1u8]);
}

fn test_ipv4_network_endian_transmutate_from_bytes() {
    let bytes : [u8; 4] = [127u8, 0u8, 0u8, 1u8];
    let network_byte_order_u32 : u32 = unsafe { std::mem::transmute(bytes) };
    assert_eq!(network_byte_order_u32, if cfg!(target_endian = "big") { 0x7F000001 } else { 0x0100007F });
    let host_byte_order_u32 : u32 = u32::from_be(network_byte_order_u32);
    assert_eq!(0x7F000001, host_byte_order_u32);
    assert_eq!(Ipv4Addr::localhost(), Ipv4Addr::from(host_byte_order_u32));
}

I also want to note that std::net::Ipv4Addr::from(network_byte_order_u32) is wrong, since Ipv4Addr::from must always take a host order integer. So I'd be hesitant to put it in a test, even for demonstration purposes.

bors added a commit that referenced this issue Apr 1, 2018
Clarify network byte order conversions for integer / IP address conversions.

Opened primarily to address #48819.

Also added a few other conversion docs/examples.
@frewsxcv
Copy link
Member

frewsxcv commented Apr 1, 2018

the root issue here seems to have been addressed by #49418

if you have ideas test improvements, might be worth opening a pull request or separate issue for that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-docs Area: documentation for any part of the project, including the compiler, standard library, and tools C-bug Category: This is a bug. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants