Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve WriteBytesExt API #27

Closed
nwin opened this issue Apr 22, 2015 · 30 comments
Closed

Improve WriteBytesExt API #27

nwin opened this issue Apr 22, 2015 · 30 comments

Comments

@nwin
Copy link

nwin commented Apr 22, 2015

At the moment the byteorder crate uses a enum to generalize over the endianness and the method name to select the type. Both is not very ergonomic to use. I propose the following interface:

trait WriteBytesExt<T> {
    fn write_le(&mut self, n: T) -> io::Result<()>;
    fn write_be(&mut self, n: T) -> io::Result<()>;
}

impl<W> WriteBytesExt<u8> for W where W: Write {
    fn write_le(&mut self, n: u8) -> io::Result<()> {
        ....
    }
    ....
}

First of all it gets rid of the enum. Since the enum is purely a compile time parameter it cannot be used for dynamic dispatch. This is as good or bad as having it directly the method name. Thus I do not see the point of having it. Secondly it gets rid of the redundant type name in the signature.

This shortens the method call significantly

w.write_u16::<LittleEndian>(42u16)

becomes

w.write_le(42u16)

My two points are:

  1. The type in the method signature carries redundant information.
  2. The enum type parameter does not provide any benefit for the user.
  3. Enums are most useful for runtime polymorphism. Thus, as the enum variants are no types these *BytesExt traits cannot be use to write generic code that abstracts over endianness. Again no benefit for the user.
@nwin
Copy link
Author

nwin commented Apr 22, 2015

Alternatively one could elevate the enum variants to real types and use them as parameters which allows a more ergonomic use (as it should make it easier to write generic code for it) at the cost of being as long as before.

trait ByteOrder<T> {}

struct LittleEndian<T>(pub T);
struct BigEndian<T>(pub T);

impl<T> ByteOrder<T> for LittleEndian<T> {}
impl<T> ByteOrder<T> for BigEndian<T> {}

trait WriteBytesExt<T, B> where B: ByteOrder<T> {
    fn write_bytes(&mut self, n: B) -> io::Result<usize>;
}

impl<W> WriteBytesExt<u8, LittleEndian<u8>> for W where W: Write {
    fn write_bytes(&mut self, n: LittleEndian<u8>) -> io::Result<usize> {
        self.write(&[n.0])
    }
}

Then

w.write_u16::<LittleEndian>(42u16)

would become

w.write_bytes(LittleEndian(42u16))

@nwin nwin changed the title WriteBytesExt API is not ergonomic to use Improve WriteBytesExt API Apr 22, 2015
@BurntSushi
Copy link
Owner

I've heard a few people moan about the current API, but this is the best alternative I've seen so far. A few comments:

  • You will need one more method, probably write_native to replicate existing functionality. Currently, a type alias is used instead.
  • If WriteBytesExt changes, then so to must ReadBytesExt. I assume we'll need to rely on return value polymorphism? So e.g., instead of rdr.read_u32::<BigEngian>() you'd have rdr.read_be::<u32>() or preferably, let n: u32 = rdr.read_be().

I want to noodle on this some more. I'd also love to get feedback from others. I don't have a good handle on who is using byteorder, but here are some names I've picked off from: https://crates.io/crates/byteorder/reverse_dependencies

cc @TyOverby @jamesrhurst @blackbeam @sfackler @mitsuhiko @cyderize

@TyOverby
Copy link

I use byteorder in bincode and while I don't use the functionality yet, once default-types for functions land I plan on using the current implementation to allow my users to write code like this:

bincode::encode(&my_struct); // default to network byte order
bincode::encode::<LittleEndian>(&my_struct); // specify endianness
// Use other endianness like above

The benefit to the way that @BurntSushi currently implements it is that having these functions called with different endianness imposes no runtime lookup, which is very important to bincode.

@BurntSushi
Copy link
Owner

@TyOverby This is the first I'm hearing about default types for functions. Cool! Is there an RFC?

@sfackler
Copy link
Contributor

My use case is pretty simple - I never need to generically vary over either endianness or the type. Either the fn <T> write_be(t: T) design or the old school write_be_i32(t: i32) design would be a bit less verbose than the current setup for me.

@TyOverby
Copy link

@BurntSushi Yeah, the main default types RFC.

Their example is

pub fn range<A:Enumerable=uint>(start: A, stop: A) -> Range<A> {
    Range{state: start, stop: stop, one: One::one()}
}

@nwin
Copy link
Author

nwin commented Apr 23, 2015

I must redact my criticism a little bit. :( I do not know how I missed that but BigEndian and LittleEndian are of course not enum variants.

That means my strong arguments shrink to the return type polymorphism. Still I think a shortcut like write_le would be nice.

@nwin
Copy link
Author

nwin commented Jun 8, 2015

Have you spend some more thought about it? If one would use a generic solution, third party types would be able to hook into the same infrastructure by simply implementing ByteOrder.

Something like this:

#![feature(associated_consts)]
use std::io;

trait ByteOrder<T> {
    const Size: usize;
    fn read_bytes(buf: &[u8]) -> T;
}

enum LittleEndian {}
enum BigEndian {}

impl ByteOrder<u8> for LittleEndian {
    const Size: usize = 1;
    fn read_bytes(buf: &[u8]) -> u8 {
        buf[0]
    }
}
impl ByteOrder<u16> for LittleEndian {
    const Size: usize = 2;
    fn read_bytes(buf: &[u8]) -> u16 {
        ((buf[1] as u16) << 8) | buf[0] as u16
    }
}
trait ReadBytesExt<T> {
    fn read<B: ByteOrder<T>>(&mut self) -> io::Result<T>;
}
impl<T, R> ReadBytesExt<T> for R
where R: io::Read {
    fn read<B: ByteOrder<T>>(&mut self) -> io::Result<T> {
        let mut buf = [0; B::Size];
        assert!(try!(self.read(&mut buf)) == B::Size);
        Ok(B::read_bytes(&buf))
    }
}

fn main() {
    let mut v = &[1, 2, 3][..];
    let a: u8 = v.read::<LittleEndian>().unwrap();
    let b: u16 = v.read::<LittleEndian>().unwrap();
    println!("{} {}", a, b);
}

(Doesn’t compile thought. Probably one has to wait for non-type type parameters)

Edit: Seems like it is just a temporary restriction that the code gets rejected.

@nwin
Copy link
Author

nwin commented Jun 8, 2015

Ok I just wrote the whole thing to illustrate my point: nwin/byteio.

It has almost the same API as byteorder:

use byteio::{LittleEndian, ReadBytesExt};
let mut reader = &[1, 2][..];
let val: u16 = reader.read_as::<LittleEndian>().unwrap();
assert_eq!(val, 513u16);

Benchmarks indicate that it may be faster than byteorder but I don’t trust them. In theory it should, since I’m using byte arrays instead of slices.

@BurntSushi
Copy link
Owner

@nwin Sorry for the late reply. I do actually like your approach. Not only is the implementation smaller/simpler, but the API surface is smaller too.

What is the best way to proceed? Lots of people are using byteorder, so I think we should:

  1. Continue to maintain the current API at 0.3.x, but deprecate it. (So put it in a branch, I guess.)
  2. Remove the old API and move over to your API and bump the version to 0.4.x.
  3. Replace current docs with 0.4.x docs and put a link/notice to the old 0.3.x docs.

@nwin
Copy link
Author

nwin commented Jul 10, 2015

I agree, that seems to be a good approach.

What about the method name of Read/WriteBytesExt? as is cost-free by convention. Maybe one should use read/write_into?

@BurntSushi
Copy link
Owner

I really like how read_as/write_as sounds, but erring on the side of convention seems like a good idea. I'm fine with read_into/write_into.

@nwin
Copy link
Author

nwin commented Jul 11, 2015

I’ll file a PR this weekend then we can discuss further issues with the new implementation there.

@cesarb
Copy link
Contributor

cesarb commented Jul 11, 2015

I'd like to present a different opinion: having the type in the method signature is a good thing, even if it's redundant.

I've followed for a while the cleanups the LibreOffice team were applying to the legacy OpenOffice codebase. One of these cleanups was to change the implicit SvStream operators, where writing a 16-bit quantity would be something like:

int16_t var = ...;
stream << var;

to an explicit method call, like:

int16_t var = ...
stream.writeInt16(var);

This prevents a kind of error they often encountered, where changing a variable type somewhere accidentally changed a binary serialization format elsewhere. Since a byteorder crate is going to be used for binary serialization, where the type width is as important as its endianness, that's important.

The API I'd prefer is something like:

w.write_le::<i16>(var);

(same for _be/_ne), where the type can be elided if the variable is declared nearby, but can also be given explicitly to prevent accidental serialization format changes (in this example, the compiler would give an error if var isn't an i16). But anything works, as long as the endianness and the type can both be specified explicitly at the same time in the method call.

@nwin
Copy link
Author

nwin commented Jul 11, 2015

That is a good point. In that case

w.write_as::<i16, LittleEndian>(var);

would be a good API. Unfortunately the type can’t be elided in that call (yet?).

@cesarb
Copy link
Contributor

cesarb commented Jul 11, 2015

If you want to ever be able to elide it, it must be in the opposite order:

w.write_as::<LittleEndian, i16>(var);

So the declaration would be something like:

fn write_as<B, T>(&mut self, n: T) -> Result<()>
    where B: ByteOrder<T> { ... }

The question would then be: how to default the type parameter?

fn write_as<B, T = ???>(&mut self, n: T) -> Result<()>
    where B: ByteOrder<T> { ... }

The obvious answer would be to add a type Type = T to the trait ByteOrder<T> and use it:

fn write_as<B, T = B::Type>(&mut self, n: T) -> Result<()>
    where B: ByteOrder<T> { ... }

but that looks awfully circular.

@cesarb
Copy link
Contributor

cesarb commented Jul 11, 2015

In the worst case, the explicit version could be written on top of the implicit version, as a separate method:

w.write_as::<LittleEndian>(var);
w.write_as_explicit::<LittleEndian, i16>(var);

but of course having a single method would be much cleaner. And there's always the following, which should already work with your current code:

w.write_as::<LittleEndian as ByteOrder<i16>>(var);

I could also imagine a type LittleEndianT<T> which implements only ByteOrder<T> (passing everything through to LittleEndian) allowing the following:

w.write_as::<LittleEndian>(var);
w.write_as::<LittleEndianT<i16>>(var);

@LukasKalbertodt
Copy link

Any updates on this? Many ideas in the comments above look a lot better than the current design...

@BurntSushi
Copy link
Owner

@LukasKalbertodt Not yet, just haven't gotten to it yet, but I haven't forgotten.

One excuse I have is that there are many reverse dependencies of byteorder that use *. Pulling this in will break all of those crates instantly. Once this RFC takes effect, then it will be a better time to come back to this.

@TyOverby
Copy link

TyOverby commented Nov 3, 2015

@BurntSushi Do you plan on rewriting byteorder from the ground up with the new api, or just adding these on top?

I still plan on using the generic functions heavily; will I be forced to use an abandoned version?

@BurntSushi
Copy link
Owner

@TyOverby I forgot about that. There's no way I'm maintaining two distinct APIs. If the new API loses a real use case, then that kind of seems like a serious mark against it.

@TyOverby
Copy link

TyOverby commented Nov 3, 2015

Yeah, I think that once default-types for functions land, having trait-based (compile-time) dispatch will be a lot more attractive.

With the same function you could express all of these some library API.

// call regularly
function_that_uses_byteorder(&my_obj, &mut writer);
// call with specified order 
function_that_uses_byteorder::<LittleEndian>(&my_obj, &mut writer);
// all I care about is perf!
function_that_uses_byteorder::<NativeEndian>(&my_obj, &mut writer);

This is exactly what I plan on doing for Bincode. Right now I use big-endian, but for projects like Servo that are just using Bincode for cross-process communication, I could let them choose the endianness without convoluting the Bincode API.

@BurntSushi
Copy link
Owner

@TyOverby So it sounds like we might want to table API redesign (if any) until at least default-types for functions lands.

@LukasKalbertodt
Copy link

One excuse I have is that there are many reverse dependencies of byteorder that use *.

I count 5 crates with a total of about 1800 Downloads (~1.5% of byteorder's downloads). That's not too much IMO, "it's their own fault" and the RFC will not magically fix those crates. I wouldn't really care about that too much actually...

I did not read all comments above, so I just have to agree that waiting for Rust features that allow a better API design is probably a good idea. And I also agree that maintaining two different APIs is not a good idea. Thanks for the status update :)

@BurntSushi
Copy link
Owner

I count 5 crates with a total of about 1800 Downloads (~1.5% of byteorder's downloads). That's not too much IMO

I count 50 crates with a * dependency on byteorder. However, only a few have more than a thousand downloads: sha1, tz and websocket.

"it's their own fault"

I don't really care if it's their "own fault" or not. The fact is, there are lots of crates that will break immediately for all downstream users. If there's an opportunity to mitigate that, then I'm going to take it.

And also, the README for this project recommended a * dependency for quite some time. So I'm not sure it really is their own fault anyway.

the RFC will not magically fix those crates

It will mitigate breakage on active crates.

@LukasKalbertodt
Copy link

Oops -- shame on me. Nevermind then.

@pravic
Copy link

pravic commented Jun 26, 2016

It seems this issue is abandoned for now.

So, there are two ways out of here: change api with backward compatibility (via semver) or fork this crate with proposed changes.

@nwin
Copy link
Author

nwin commented Jun 26, 2016

It is not abandoned, as far as I understand we're just waiting for default type parameters to happen (rust-lang/rust#27336). At least the *-issue should be resolved by now as it is now disallowed on crates.io.

@pravic
Copy link

pravic commented Jun 26, 2016

1.2 years of waiting? Really perfect issue resolving way for software engineers.

@BurntSushi
Copy link
Owner

I'm planning to take this crate to 1.0 soon with the current API.

If and when default type parameters is a thing, then we can revisit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants