RFC for read_all #980

novalis · 2015-03-15T19:53:43Z

codyps · 2015-03-15T20:20:08Z

text/0000-read-all.md

+returned. After a read call returns having successfully read some
+bytes, the total number of bytes read will be updated.  If that
+total is equal to the size of the buffer, read will return
+successfully.


What is the reasoning for introducing a new error kind instead of following the pattern of read()'s return value (io::Result<uint>)?

Consistency with write_all.

io::Result is either Ok(uint) or Error. read_all has only one kind of ok result: "yep, we read the whole thing". If instead we returned Ok(number of bytes read), then every caller would have to check that the number of bytes read matched their expectations. In other words, we are optimizing for the common case of "either this file has the data, or we're doomed". And because the error on EOF is distinct from other errors, callers who care can choose to match on it and handle it separately. But I expect that most callers will not care.

In other words, we are optimizing for the common case of "either this file has the data, or we're doomed".

Who says that that's the common case? What about: "Fill this buffer as much as possible without me having to check for EINTR"?

I think one could use take + read_to_end to do something like that (of course, it allocates a new buffer instead of re-using an old one). It seems likely to me that the "eof is a temporary condition rather than an error" applies more to network streams, and re-using buffers in network stream situations is less safe: http://www.tedunangst.com/flak/post/heartbleed-in-rust

I guess my only argument for read_all being the common case is that I've seen it twice (once in the code I am now writing, and once in the link above), compared to zero for the other one.

I could put this into the rationale section, if you like.

of course, it allocates a new buffer instead of re-using an old one

Exactly.

It seems likely to me that the "eof is a temporary condition rather than an error"

I wasn't talking about EOF. I was talking about EINTR.

Is it a four byte difference between i32 errno and u64 usize that we're talking here?

No that's probably manageable, I just wanted to point out that the size of io::Error is a very real worry and it can be affected by ErrorKind depending on how we're implementing it.

Making a separate error type for this method might be the most straightforward approach.

This is possible, but has drawbacks:

More API surface area to stabilize

Nonstandard as the very similar write_all API would probably also want this treatment.

Can hinder understanding by returning a non-normal error type.

Operating generically over I/O functions may be hindered as this would stick out.

I think write_all can't use a separate error type. The write_all fn should never return WriteZero on a POSIX system (unless a zero-length buffer is passed); instead, it will either succeed or return some other error. So in fact we would have need to annotate all of the other io errors with the number of bytes written. I don't know how useful this would be, and it seems like an odd wart to have to have on io::Error given that it's only useful for write_all and read_all.

By contrast, read_all could just add the number of bytes written to ShortRead. That seems simpler, and I would be happy to adjust this proposal and the implementing Pull Request if that's the consensus.

I don't understand how operating generically would be hindered -- can you explain a bit? I'm not very familiar with the language.

@alexcrichton, the second case (Some bytes were read, and then we saw Ok(0)) is the only case for which I would generally care about knowing how many bytes were read. For the I/O error case, the vast majority of the time I don't care about how many bytes were read, I just bail out and return the error. I'm much more likely to care in the Ok(0) case, such as if I'm reading fixed-size records from a file or the network, but the last record is allowed to be short.

Since the EOF condition doesn't exist for write_all, no changes would be needed there for consistency.

One way to implement this would be as follows:

Add ShortRead(usize) to Repr

Also add ShortRead(usize) to ErrorKind

Add Repr::ShortRead(n) => ErrorKind::ShortRead(n) to Error::kind (and also update description and detail)

This would not increase the size of io::Error. (io::Error already has Custom(Box<Custom>), which is the same size). It would increase the size of Custom, but that always gets boxed. It would also allow the return value to stay io::Result<()>, as desired, and would maintain usability by keeping the check for a short read out of the normal program flow and allowing a short read to be treated as a normal error value (via try!, et cetera) unless the user explicitly wants to handle it differently.

@novalis

I don't understand how operating generically would be hindered -- can you explain a bit?

This may not be too too common, but if you are operating in a context that expects io::Result<T> to be returned via something like:

fn foo<F>(f: F) -> io::Result<T> where F: Fn() -> io::Result<T> { // ... }

You can't naturally use read_all with a different error type as you would be able to otherwise. You can probably get around it with try! and Ok, but it's somewhat unfortunate to have to do so.

@rkjnsn

Since the EOF condition doesn't exist for write_all, no changes would be needed there for consistency.

Technically Ok(0) is indeed special as cases like Write on &mut [u8] will end up generating this once you hit the end of the buffer.

One way to implement this would be as follows:

Hm yes, that does seem plausible! I would still be a little wary of adding this error, but we do have precedent with WriteZero, so it may no be so bad.

@alexcrichton
Technically Ok(0) is indeed special as cases like Write on &mut [u8] will end up generating this once you hit the end of the buffer.

Really? I would expect and strongly argue that that should generate an IO error, even for a standard write call, and not return Ok(0). In my mind, this is analogous to a pipe or a socket with the remote end closed, where reading yields end of file, while writing results in an error. As perhaps a closer analogy to the fixed buffer case, on Linux, if I have a fixed-size block device (say a small ramdisk), and I read to the end, subsequent reads will return 0 (end of file). If, however, I reach the end of the file and try to write more, I'll receive an error (ENOSPC). Similarly, writing to a file that has reached the maximum allowed size results in (EFBIG), not a successful write of zero size.

Is there any chance this can be changed?

rkjnsn · 2015-03-19T20:24:28Z

👍 to this in its current form.

rkjnsn · 2015-03-19T20:28:14Z

text/0000-read-all.md

+
+If we wanted io::Error to be a smaller type, ErrorKind::ShortRead
+could be unparameterized.  But this would reduce the information
+available to calleres.


I would note that having a parameterized ErrorKind::ShortRead wouldn't actually increase the size of io::Error, only that of ErrorKind.

novalis · 2015-03-22T02:02:24Z

Thanks for all the comments so far. Is there anything else I should be doing here? Or should I just be patient and let the process work its magic?

aturon · 2015-03-23T21:46:48Z

@novalis

Thanks for the PR! We're a little slow dealing with RFCs at the moment, since the focus is on pushing out the beta release next week.

This RFC touches on some issues that were discussed pretty extensively during IO reform:

In particular, we used to have some convenience methods like this (see read_exact and read_at_least), but the consensus was that these weren't pulling their weight.

Note that it's quite straightforward to build libraries outside of std that supplement the behavior of all readers with conveniences like the one proposed here, so it's not critical that std provide every last convenience. (Over time, we expect the set to grow as external libraries become de facto standard.)

Note also that there's an inherent asymmetry between reading and writing here, in that it's easy to get a slice of an entire existing data structure (Vec<u8>, say), whereas it's perhaps a bit less common to have a perfectly-sized slice to read into (though I can certainly imagine some cases like this).

In short, it'd help to have more discussion about the scenarios in which this convenience is a big win, to the point that it should be included in the standard library (that has just been pared down), keeping in mind that std can be extended over time.

cc @alexcrichton

novalis · 2015-03-25T06:05:02Z

Note also that there's an inherent asymmetry between reading and writing here, in that it's easy to get a slice of an entire existing data structure (Vec<u8>, say), whereas it's perhaps a bit less common to have a perfectly-sized slice to read into (though I can certainly imagine some cases like this).

That's a very good point. The use cases I'm thinking about include the ones in byteorder, where a buffer whose size is known at compile-time could be allocated on the stack, and ones in my own code, where the read size is not known at compile-time. Both are fairly common in applications I have seen. Given read_all(&mut [u8]), users can always allocate a Vec of the correct size and fill it up. But given read_all -> Result<Vec<u8>>, some allocation is always performed. So this imposes some additional cost over using the raw read function. RFC 517 says:

...the abstractions must be safe, and they should be as high-level as possible without imposing a tax...

So this definition of read_all, to me, seems like the only alternative that is safe but does not impose a tax.

That said, I didn't have to write byteorder, but I do have to write my application, so from a purely selfish perspective, a read_all returning a Vec would be simpler. But I don't think it is a good idea.

In particular, we used to have some convenience methods like this (see read_exact and read_at_least), but the consensus was that these weren't pulling their weight.

read_at_least seemed to be a complicated optimization for an uncommon use case, so I can definitely see why it was pulled. It looks like read_exact was pulled for safety reasons. But my proposed read_all is not unsafe, so that is no objection.

Here's one scenario where I would use read_all: reading git pack files: https://www.kernel.org/pub/software/scm/git/docs/technical/pack-format.txt

(deltified representation)
...
20-byte base object name
....

If I'm reading the "20-byte base object name", I want to use read_all because (a) a premature eof is an error and (b) I know exactly how many bytes I want to read (and probably have a buffer for it). Looking at the git source code, the approximately equivalent function ("read_in_full") is called a couple dozen times. A few are for read_to_end scenarios, but most are for cases where users simply don't want to have to deal with EINTR.

nwin · 2015-05-02T08:08:55Z

👍 I was just about the write the same RFC.

ArtemGr · 2015-05-02T11:24:35Z

Given read_all(&mut [u8]), users can always allocate a Vec of the correct size and fill it up.

I'd much prefer this version to look like read_all (&mut Write, usize), in this way one can stream from Read to Write seamlessly.

nwin · 2015-05-02T11:29:23Z

I'd much prefer this version to look like read_all (&mut Write, usize), in this way one can stream from Read to Write seamlessly.

That is a completely different functionality. What you are proposing is a different (admittedly better) signature for read_to_end(). read_all(&mut [u8]) is about reading exactly n bytes. It is the equivalent of Write::write_all.

ArtemGr · 2015-05-02T13:17:09Z

No, I'm proposing reading exactly n bytes too, hence the second argument.

Admittedly, the Write version isn't as convenient to use with slices. Perhaps there's a space for both the slice and the Write variants.

nwin · 2015-05-02T13:50:20Z

Ahh I see, I was confused by the name. One should better call it read_exact then (or maybe stream?). But I think this is worth a new RFC.

bluss · 2015-05-02T19:52:14Z

text/0000-read-all.md

+write_all.
+
+Or we could leave this out, and let every Rust user write their own
+read_all function -- like savages.


This is unnecessary. Cargo makes us a highly developed distributed culture. With minimal investment needed to contribute & make a difference.

This is unnecessary. Cargo makes us a highly developed distributed culture.

If you mean we don't need a standard library because we have Cargo then I disagree strongly. Putting things off into crates was a strategy used to reach a stable 1.0 library in time, but the standard library should catch up eventually. If you do a thing, you should do it good. If standard library does reading and writing, then it should cover such basics as filling an entire buffer in face of interrupts.

@ArtemGr Completely agree

I'm disagreeing with saying we're savages if it's not in libstd. And we don't have to write it ourselves -- it's easy to reuse code using cargo, very easy. Not saying that's a replacement but an augmentation.

Not savages, but babies. = )
It's 1.0 beta and there is space to grow.

I don't understand why you do this, calling people names is beside the point of the RFC and doesn't help. I'll reply with some on-topic questions in the main discussion.

Huuuuuge -1 for 'like savages' here.

calling people names is beside the point of the RFC and doesn't help

Sorry it didn't help, @bluss. I had to try. = }

So ignoring the last line of this RFC. As has already been pointed out there is a discrepancy between reading/writing with the std::io as it currently stands. I comments on here would also reenforce that its already catching a number of us out. Simply deferring to cargo for a single function will also create problems. Not only in the cost of tracking down the correct crate (maybe byteorder in this case?) but also the cost of pull in all of its dependencies etc.

The crate can be designed for that use case though: small and without dependencies. Using cargo is super easy IMO. Since crates may even re-export other crates, you can even ask authors to split out features to sub crates (when it's possible).

bluss · 2015-05-02T20:30:07Z

There are already two ways that I know of to have Rust perform the read loop to fill a buffer. One using Vec and one to a stack-allocated buffer. We employ composition (Take adaptor) here, which is commonly favoured in Rust.

I understand the I/O API will grow, but to formulate the way forward we need to understand what we already have and the problems with it. A complete RFC should certainly address it. To the code: (playpen link)

use std::io::{self, Read};

fn main() {
    const N: usize = 32;

    let data: &[u8] = &[1u8; 1024];

    // Use std::io::copy to fill a stack buffer
    // Documented to retry and continue on interrupt.
    let mut buf = [0u8; N];
    let res = io::copy(&mut data.take(N as u64), &mut &mut buf[..]);
    println!("{:?}", res);
    println!("{:?}", buf);

    // Use .read_to_end to read to a vec.
    // Documented to retry and continue on interrupt.
    let mut vec = Vec::with_capacity(N);
    let res = data.take(N as u64).read_to_end(&mut vec);
    println!("{:?}", res);
    println!("{:?}", vec);
}

ArtemGr · 2015-05-02T20:50:28Z

Didn't know we have copy. And the take does seem to do the job. Looks like we have it all covered.

markuskobler · 2015-05-02T20:55:08Z

@bluss thanks for following up with an example. But is the big downside of using copy/take over the proposed rfc is the internal allocation http://doc.rust-lang.org/1.0.0-beta.3/src/std/io/util.rs.html#31-44 that would not be needed with https://github.com/rust-lang/rust/pull/23369/files#diff-668f8f358d4a93474b396dcb3727399eR202

novalis · 2015-05-03T02:35:54Z

@bluss I tried take when I was first writing this code, but it seems that File.take works differently than &[u8].take (?). I took your sample and rewrote it to use File: https://gist.github.com/novalis/160eae78e90900af0f14

When I go to compile it, I get:

test1.rs:29:15: 29:19 error: use of moved value: `file`
test1.rs:29     let res = file.take(N as u64).read_to_end(&mut vec);
                          ^~~~
test1.rs:22:29: 22:33 note: `file` moved here because it has type `std::fs::File`, which is non-copyable
test1.rs:22     let res = io::copy(&mut file.take(N as u64), &mut &mut buf[..]);
                                        ^~~~
error: aborting due to previous error

I probably just don't know rust.

nwin · 2015-05-03T16:15:37Z

@bluss: Both of your examples fail to fail (link) if you cannot take N bytes from data. That is the main point of read_all.

bluss · 2015-05-04T13:23:35Z

@novalis You'd have to use .by_ref() with a file. I'm not advocating the specific io::copy API because it's very tricky to call.

@nwin Is it the main point? The first step is to perform the retry loop at all, I guess .read_to_end() is the best/only viable way to do that in Rust. To fail if less is read, I think the name .read_exact() is better in that case.

@markuskobler Yes, io::copy is clearly deficient. The way its API works is crummy, and it uses a big buffer, so it's not for this use case at all.

novalis · 2015-05-05T07:26:35Z

.by_ref().take(N) works for me. Sure, I have to check how many bytes have been read, but I would have to do that somehow anyway.

nwin · 2015-05-05T07:34:03Z

@bluss: my main point is the asymmetry between Read and Write. If we have write_all we should also have read_all. It is equally usable.

markuskobler · 2015-05-05T09:13:55Z

@nwin 👍

cesarb · 2015-07-16T23:50:02Z

@gsingh93 I'd prefer to merge it here so as to not split the discussion, if @novalis is okay with it. This is the first time I meddle with a Rust RFC, so I'm not sure what the correct procedure would be. @alexcrichton , @aturon , any advice?

gsingh93 · 2015-07-16T23:56:02Z

@cesarb You can submit a PR to https://github.com/novalis/rfcs/tree/novalis/read-all, and if it gets merged it should show up here I think.

alexcrichton · 2015-07-17T00:31:48Z

I'm fine whichever way you prefer @cesarb, merging into here would involve doing what @gsingh93 mentioned (sending a pr to @novalis), and then it's up to @novalis to merge it or not.

Make this RFC be again about a single method

novalis · 2015-07-17T05:52:10Z

Ok, I merged your PR.

alexcrichton · 2015-07-17T18:09:41Z

Thanks @novalis and @cesarb! I think this will probably be ready to go back into FCP next week.

Explanations about the buffer contents

cesarb · 2015-07-20T03:46:17Z

I added an explanation for why the contents of buf should be undefined on failure, and wrote a proposed implementation (with working tests). I will not submit a pull request for the proposed implementation until this RFC becomes "active", I wrote it only to help the FCP discussion; it should be visible in the timeline right above this comment (in case it isn't visible, it's at cesarb/rust@0a1b44a).

cesarb · 2015-07-20T04:02:25Z

And, just for fun, an example of how this would be used in real life:

extern crate byteorder;

use byteorder::{BigEndian, ReadBytesExt, WriteBytesExt};
use std::io::{Read, Write, Result};

struct Header {
    tag: [u8; 16],
    number: u64,
}

impl Header {
    pub fn read<R: Read>(reader: &mut R) -> Result<Self> {
        let mut tag = [0; 16];
        try!(reader.read_exact(&mut tag));
        let number = try!(reader.read_u64::<BigEndian>());
        Ok(Header { tag: tag, number: number })
    }

    pub fn write<W: Write>(&self, writer: &mut W) -> Result<()> {
        try!(writer.write_all(&self.tag));
        try!(writer.write_u64::<BigEndian>(self.number));
        Ok(())
    }
}

gsingh93 · 2015-07-29T20:26:56Z

@alexcrichton Can this be put back into it's FCP? I'm hoping to get this in the 1.3 beta so I don't want it to get too delayed.

alexcrichton · 2015-07-29T21:45:34Z

Thanks for the ping @gsingh93, we were actually just in the libs triage meeting and we did indeed decide to put this back in FCP. This won't make the 1.3 window as #[stable] either way, though, unfortunately.

Anyway, this RFC is now entering its week-long final comment period.

huonw · 2015-08-05T16:46:29Z

text/0000-read-all.md

+
+``` rust
+fn read_exact(&mut self, mut buf: &mut [u8]) -> Result<()> {
+    while !buf.is_empty() {


I feel like this while should just be a loop?

I feel like this while should just be a loop?

If it were a loop, we would always do one extra call of self.read with a zero-length buf before exiting the loop. That's wasteful and could confuse poorly-implemented Read instances.

(Example: we had 10 bytes left to read, and self.read(buf) returned Ok(10). Now buf has length zero, and we go again to the top of the loop. Without the test for !buf.is_empty(), we would call again self.read(buf) with a zero-length buf.)

rkjnsn · 2015-08-06T22:54:20Z

I'm okay, splitting these methods up for easier discussion/approval. However, as I've said before, if we were to only get one, I'd rather have read_full than read_exact as it covers most of the convenience while allowing several additional use cases. Thus, my support of this RFC in its current form is contingent on assurance that accepting it won't decrease the probability of a future read_full RFC being approved.

aturon · 2015-08-07T16:34:42Z

The libs team, after another round of discussion, has decided to merge this RFC as written. Thanks for the detailed writeup of the design tradeoffs in particular!

Tracking issue

This implements the proposed "read_exact" RFC (rust-lang/rfcs#980).

This implements the proposed "read_exact" RFC (rust-lang/rfcs#980). Tracking issue: #27585

RFC for read_all

288b1b5

novalis mentioned this pull request Mar 15, 2015

Add read_all to std:io::Read to parallel write_all in std::io::Write. rust-lang/rust#23369

Closed

codyps reviewed Mar 15, 2015
View reviewed changes

novalis added 2 commits March 18, 2015 00:57

address review comments

6566a1d

Address review feedback

1e457ad

rkjnsn reviewed Mar 19, 2015
View reviewed changes

aturon self-assigned this Mar 19, 2015

rkjnsn mentioned this pull request Mar 19, 2015

add back bufferless read_to_string/read_to_end methods #970

Closed

bluss reviewed May 2, 2015
View reviewed changes

fix wrong statement

c7f633b

novalis force-pushed the novalis/read-all branch from acd01bc to c7f633b Compare May 3, 2015 02:27

novalis closed this May 5, 2015

cesarb mentioned this pull request Jul 16, 2015

Make this RFC be again about a single method novalis/rfcs#2

Merged

Merge pull request #2 from cesarb/read_all

894ab34

Make this RFC be again about a single method

cesarb and others added 2 commits July 19, 2015 23:30

Explanations about the buffer contents

7cd058d

Merge pull request #3 from cesarb/read_all

67491eb

Explanations about the buffer contents

alexcrichton added the final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. label Jul 29, 2015

huonw reviewed Aug 5, 2015
View reviewed changes

aturon mentioned this pull request Aug 7, 2015

Tracking issue for read_exact (RFC #980) rust-lang/rust#27585

Closed

aturon merged commit 67491eb into rust-lang:master Aug 7, 2015

cesarb mentioned this pull request Aug 7, 2015

Implement read_exact for the Read trait rust-lang/rust#27588

Merged

cesarb added a commit to cesarb/rust that referenced this pull request Aug 24, 2015

Implement read_exact for the Read trait

ff81920

This implements the proposed "read_exact" RFC (rust-lang/rfcs#980).

bors added a commit to rust-lang/rust that referenced this pull request Aug 30, 2015

Auto merge of #27588 - cesarb:read_all, r=alexcrichton

4bb9023

This implements the proposed "read_exact" RFC (rust-lang/rfcs#980). Tracking issue: #27585

steveklabnik mentioned this pull request Dec 18, 2015

Add read_full to std::io::ReadExt rust-lang/rust#23174

Closed

alexcrichton mentioned this pull request Sep 5, 2017

add read_at_least function? tokio-rs/tokio-io#68

Closed

Centril added the A-input-output Proposals relating to std{in, out, err}. label Nov 23, 2018

RFC for read_all #980

RFC for read_all #980

Conversation

novalis commented Mar 15, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkjnsn commented Mar 19, 2015

Choose a reason for hiding this comment

novalis commented Mar 22, 2015

aturon commented Mar 23, 2015

novalis commented Mar 25, 2015

nwin commented May 2, 2015

ArtemGr commented May 2, 2015

nwin commented May 2, 2015

ArtemGr commented May 2, 2015

nwin commented May 2, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bluss commented May 2, 2015

ArtemGr commented May 2, 2015

markuskobler commented May 2, 2015

novalis commented May 3, 2015

nwin commented May 3, 2015

bluss commented May 4, 2015

novalis commented May 5, 2015

nwin commented May 5, 2015

markuskobler commented May 5, 2015

cesarb commented Jul 16, 2015

gsingh93 commented Jul 16, 2015

alexcrichton commented Jul 17, 2015

novalis commented Jul 17, 2015

alexcrichton commented Jul 17, 2015

cesarb commented Jul 20, 2015

cesarb commented Jul 20, 2015

gsingh93 commented Jul 29, 2015

alexcrichton commented Jul 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rkjnsn commented Aug 6, 2015

aturon commented Aug 7, 2015