Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VRL] wrong values returned from ip_aton() and ip_ntoa() functions #9182

Closed
hhromic opened this issue Sep 16, 2021 · 4 comments · Fixed by #9208
Closed

[VRL] wrong values returned from ip_aton() and ip_ntoa() functions #9182

hhromic opened this issue Sep 16, 2021 · 4 comments · Fixed by #9208
Labels
domain: vrl Anything related to the Vector Remap Language type: bug A code related bug.

Comments

@hhromic
Copy link
Contributor

hhromic commented Sep 16, 2021

Vector Version

vector 0.16.1 (x86_64-unknown-linux-gnu 739e878 2021-08-26)

Description

The ip_aton() and ip_ntoa() functions are wrongly byte-swapping integers for their conversions, thus giving wrong results. This is because I wrongly advised to use from_be() and to_be() to account for the host endianness in the PR:

However, it turns out that the functions used for the conversions:

  • u32::from(Ipv4Addr)
  • Ipv4Addr::from(u32)

work with abstract u32 integers which move away endianness from the programmer. This means that these conversions are actually portable and do not need any endianness conversion.

There was a long discussion about the documentation being unclear in the rust-lang repository, however I now found this comment that clarifies pretty well that the above functions are portable indeed:

The big question remains: how these functions in Vector are passing their tests currently?
I just realised that the test cases (and example values in the documentation) are using wrong numbers!

The numeric representation of 1.2.3.4 is not 67305985 but is 16909060.

Many apologies for the bad suggestions in the original PR! :(

The fix should be simple, eliminate to_be() and from_be() conversions and update the tests/docs with the real numeric value for the example IP 1.2.3.4. Let me know if you don't have much time and I can take a stab on fixing this.

@hhromic hhromic added the type: bug A code related bug. label Sep 16, 2021
@jszwedko jszwedko added the domain: vrl Anything related to the Vector Remap Language label Sep 16, 2021
@jszwedko
Copy link
Member

Thanks for this @hhromic ! I think I might be misunderstanding the desired behavior here though. What we currently have seems to match inet_aton:

#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdio.h>

int main(void) {
        struct in_addr ia;

        inet_aton("1.2.3.4", &ia);
        printf("ia=0x%x\n", ia.s_addr);
        printf("ia=%d\n", ia.s_addr);

        return 0;
}
ia=0x4030201
ia=67305985

Which should be in network byte order.

I'm, admittedly, getting very turned around by byte ordering 😄

@hhromic
Copy link
Contributor Author

hhromic commented Sep 17, 2021

Hi @jszwedko, I'm very sorry for all the confusion caused around this topic :(

The C functions inet_aton() and inet_ntoa() do store/read the bytes from memory in network order/big endian. However, what is happening in your example is that the printf() function reads memory in the host byte order, which in your case is little endian. Therefore big endian number in little endian machine = swapped printing.

The correct way to print an (struct in_addr).s_addr struct member is using the ntohl() function which will convert network byte order to host byte order (i.e. swap in little endian machines and no-op in big endian ones):

printf("ia=0x%08x\n", ntohl(ia.s_addr));
printf("ia=%u\n", ntohl(ia.s_addr));
ia=0x01020304
ia=16909060

The Python equivalents for inet_aton() and inet_ntoa() work with bytes which show the right order:

import socket
import struct

socket.inet_aton("1.2.3.4")
# b'\x01\x02\x03\x04'
# (0x01 << 24) + (0x02 << 16) + (0x03 << 8) + 0x04 = 16909060

socket.inet_ntoa(b'\x01\x02\x03\x04')
# '1.2.3.4'

And the higher-level ipaddress module computes the numeric conversion correctly too:

import ipaddress

ipaddress.IPv4Address(16909060)
# IPv4Address('1.2.3.4')

int(ipaddress.IPv4Address("1.2.3.4"))
# 16909060

@jszwedko
Copy link
Member

jszwedko commented Sep 20, 2021

@hhromic thanks for the thorough explanation! The annotated Python example helps illuminate.

I can see I misunderstood the Rust stdlib. It is converting the host-endian bytes to big-endian before doing the conversion in both cases so us doing that beforehand was causing it to reverse it again.

https://doc.rust-lang.org/stable/src/std/net/ip.rs.html#1006-1022
https://doc.rust-lang.org/stable/src/std/net/ip.rs.html#1025-1040

@hhromic
Copy link
Contributor Author

hhromic commented Sep 20, 2021

I can see I misunderstood the Rust stdlib.

Yes I misunderstood the documentation as well, therefore giving bad feedback during the PR.
But nice to see all is sorted now! Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: vrl Anything related to the Vector Remap Language type: bug A code related bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants