Remove some useless BufReader wrappers around stdin #1307

PaulCapron · 2018-10-24T14:31:33Z

This fixes, in part only, #1103.

I’m willing to remove the other useless BufReader wrappers, but I am too much of a Rust newbie to know how to do that properly. I need some help.

In, for instance, base32.rs, it’d be easy to do:

     let input = if matches.free.is_empty() || &matches.free[0][..] == "-" {
-        BufReader::new(Box::new(stdin()) as Box<Read>)
+        Box::new(stdin()) as Box<Read>
     } else {
         let path = Path::new(matches.free[0].as_str());
         let file_buf = safe_unwrap!(File::open(&path));
-        BufReader::new(Box::new(file_buf) as Box<Read>)
+        Box::new(BufReader::new(file_buf))
     };

This compiles. But then some repetitive hence excessive mutex locking/unlocking is going to take place.

base32.rs delegates to decode()/encode() in encoding.rs. There, read_to_end() is called.
If I understand https://doc.rust-lang.org/src/std/io/mod.rs.html#365-394 correctly, read_to_end()
calls read(), potentially several times. And each time, the mutex has to be taken then released.
See https://www.reddit.com/r/rust/comments/3rj54u/how_can_i_read_char_by_char_from_stdin/cwpojn1/

So it’d be better to acquire the mutex lock explicitely beforehand, doing something like:

+    let std_input; // declare this here to make the compiler happy
     let input = if matches.free.is_empty() || &matches.free[0][..] == "-" {
-        BufReader::new(Box::new(stdin()) as Box<Read>)
+        std_input = stdin();
+        Box::new(std_input.lock()) as Box<Read>
     } else {
         let path = Path::new(matches.free[0].as_str());
         let file_buf = safe_unwrap!(File::open(&path));
-        BufReader::new(Box::new(file_buf) as Box<Read>)
+        Box::new(BufReader::new(file_buf))
     };

This compiles, but this is ugly (the stupid std_input upfront declaration).

Ugly is one thing, losing the fight with the borrow checker is another.
Take cat.rs. It does not explicitely lock either (but it should). But then if you do:

     if path == "-" {
         let stdin = stdin();
         return Ok(InputHandle {
-            reader: Box::new(stdin) as Box<Read>,
+            reader: Box::new(stdin.lock()) as Box<Read>,
             is_interactive: is_stdin_interactive(),
         });
     }

The compiler is not happy, and I don’t know to please it.

Maybe it’d be better just to wait for non-lexical lifetimes to be implemented?
See rust-lang/rust#33520 rust-lang/rfcs#811 rust-lang/rust#43234

Please let me know what you think.

stdin() is already buffered. stdin().read_line() calls stdin().lock() behind the hood (see https://doc.rust-lang.org/src/std/io/stdio.rs.html#274) Here we are reading user input, prompting them to confirm their action: it seems useless to handle mutex locking/unlocking explicitely and beforehand to avoid its overhead. This commit is related to issue #1103.

stdin() is already buffered. Locking upfront explicitely avoid the overhead of mutex locking/unlocking everytime a new line is read. See https://stackoverflow.com/a/17546731 and https://www.reddit.com/r/rust/comments/3rj54u/how_can_i_read_char_by_char_from_stdin/cwpojn1/ The code cannot be simplified to “for line in stdin().lock().lines()” until non-lexical lifetimes are implemented. The compiler complains at the moment about a temporary value not living long enough/being dropped while still borrowed. See rust-lang/rust#33520 This commit is related to issue #1103.

Arcterus · 2019-02-07T08:03:09Z

One way to fix the base32 issue would be to make a generic function that takes something implementing Read and then just moving the rest of the stuff in uumain() into there. Something like:

fn uumain(/* args */) -> /* return */ {
    // ... snip ...

    if matches.free.is_empty() || &matches.free[0][..] == "-" {
        let stdin_raw = stdin();
        do_stuff(stdin_raw.lock());
    } else {
        let path = Path::new(matches.free[0].as_str());
        let file_buf = safe_unwrap!(File::open(&path));
        do_stuff(BufReader::new(file_buf));
    }

    0
}

fn do_stuff<R: Read>(input: R) {
    // blah
}

Another way is what you suggested, which is also fine but requires allocation (the current code does too).

IIRC the problem with cat is that the open() stuff needs to be restructured to avoid the Box (if we can avoid the Box we can also fairly easily get it to work with locked stdin I believe). You could maybe store the result of stdin() in the InputHandle struct like Option<Box<Stdin>> or something and then store the locked version of stdin in the struct as well, but that needs two allocations and is kind of hacky.

By the way, the issue you referenced will get closed if I merged this as-is, because you wrote "fixes" right next to the issue number, so it'd probably be better if you change the wording.

PaulCapron · 2019-02-21T12:10:29Z

I have edited the original PR message so that you can merge it as-is, without closing #1103.
I won’t go further to resolve #1103.

You are right, in base32.rs, one can do:

    let is_decode = matches.opt_present("decode");
    let ignore_garbage = matches.opt_present("ignore-garbage");

    […]

     if matches.free.is_empty() || &matches.free[0][..] == "-" {
        let stdin = stdin();
        do_it(stdin.lock(), is_decode, ignore_garbage, line_wrap);
    } else {
        let path = Path::new(matches.free[0].as_str());
        let file_buf = safe_unwrap!(File::open(&path));
        do_it(BufReader::new(file_buf), is_decode, ignore_garbage, line_wrap);
    };

    […]

fn do_it<R: Read>(input: R, is_decode: bool, ignore_garbage: bool, line_wrap: usize) {
    let mut data = Data::new(input, Format::Base32);

    if is_decode {
        match data.ignore_garbage(ignore_garbage).decode() {
            Ok(s) => print!("{}", String::from_utf8(s).unwrap()),
            Err(_) => crash!(1, "invalid input"),
        }
    } else {
        wrap_print(line_wrap, data.encode());
    }
}

But then the same thing would have to be repeated in base64.rs. Or refactoring would have to take place.

The code of uucore/encoding.rs is only used by base{32,64}.rs, so refactoring sounds feasible. But investigating a little, several problems surface:

ignore_garbage is only used when decoding. On the other hand, line_wrap is only used when encoding. But encoding.rs forces them both to be filled, in its Data structure, no matter which “conversion” is going to take place.
Data::new() initializes such member to default values, only to have base{32,64}.rs overwrites them later (when they call Data::ignore_garbage() & Data::line_wrap()).
there is duplication of the 76 default value for the line wrap. There is also duplication of line_wrap in wrap_print().
etc.

uucore/encoding.rs goal, I guess, is to avoid DRY. Only it fails at that goal, and by the way makes execution inefficient and the codeflow hard-to-understand. At this point, I’m fed up. I am not willing to go further into this codebase conundrum.

Arcterus · 2019-02-21T13:21:03Z

I've combined the base32 and base64 code. In terms of your first two points, any effect on performance would be negligible, and I assume the compiler will be able to optimize the default values out as they are never used. The third point is reasonable as 76 should not be in both encoding.rs and the base32/base64 main functions. The duplication of line_wrap is not much of an issue either, but I changed it anyway.

PaulCapron · 2019-02-21T15:04:22Z

Oh, great! Thank you from all programmers that will have to touch this code in the future 😃

The main point was that the code was hard to follow. Performance was a minor concern, more like a side-effect (“the key to performance is elegance”).

Though if uutils/coreutils seriously aims at replacing the default coreutils, then efficiency is going to be of prime importance. Who is going to run a “memory-safe” base64/wc/&c. if it takes 30% more time (and/or 30% more memory) than the theorically unsafe, but battle-tested, alternative? Not me, at least. I already suffer way too much from software getting slower more rapidly than hardware gets faster.

I assume the compiler will be able to optimize the default values out as they are never used.

A quick look at the disassembly (cargo build --release -p base32 && objdump -M intel -d target/release/base32) validates your assumption 👍

But assuming that, in general, the compiler is sufficiently smart is hazardous. See e.g. http://proebsting.cs.arizona.edu/law.html https://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf

PaulCapron added 2 commits October 24, 2018 15:08

Arcterus merged commit 39b5760 into uutils:master Feb 21, 2019

Arcterus mentioned this pull request Feb 21, 2019

base32, base64, uucore: merge base32/base64 code #1338

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove some useless BufReader wrappers around stdin #1307

Remove some useless BufReader wrappers around stdin #1307

PaulCapron commented Oct 24, 2018 •

edited

Loading

Arcterus commented Feb 7, 2019

PaulCapron commented Feb 21, 2019

Arcterus commented Feb 21, 2019

PaulCapron commented Feb 21, 2019

Remove some useless BufReader wrappers around stdin #1307

Remove some useless BufReader wrappers around stdin #1307

Conversation

PaulCapron commented Oct 24, 2018 • edited Loading

Arcterus commented Feb 7, 2019

PaulCapron commented Feb 21, 2019

Arcterus commented Feb 21, 2019

PaulCapron commented Feb 21, 2019

PaulCapron commented Oct 24, 2018 •

edited

Loading