Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove some useless BufReader wrappers around stdin #1307

Merged
merged 2 commits into from
Feb 21, 2019
Merged

Remove some useless BufReader wrappers around stdin #1307

merged 2 commits into from
Feb 21, 2019

Conversation

PaulCapron
Copy link
Contributor

@PaulCapron PaulCapron commented Oct 24, 2018

This fixes, in part only, #1103.

I’m willing to remove the other useless BufReader wrappers, but I am too much of a Rust newbie to know how to do that properly. I need some help.

In, for instance, base32.rs, it’d be easy to do:

     let input = if matches.free.is_empty() || &matches.free[0][..] == "-" {
-        BufReader::new(Box::new(stdin()) as Box<Read>)
+        Box::new(stdin()) as Box<Read>
     } else {
         let path = Path::new(matches.free[0].as_str());
         let file_buf = safe_unwrap!(File::open(&path));
-        BufReader::new(Box::new(file_buf) as Box<Read>)
+        Box::new(BufReader::new(file_buf))
     };

This compiles. But then some repetitive hence excessive mutex locking/unlocking is going to take place.

base32.rs delegates to decode()/encode() in encoding.rs. There, read_to_end() is called.
If I understand https://doc.rust-lang.org/src/std/io/mod.rs.html#365-394 correctly, read_to_end()
calls read(), potentially several times. And each time, the mutex has to be taken then released.
See https://www.reddit.com/r/rust/comments/3rj54u/how_can_i_read_char_by_char_from_stdin/cwpojn1/

So it’d be better to acquire the mutex lock explicitely beforehand, doing something like:

+    let std_input; // declare this here to make the compiler happy
     let input = if matches.free.is_empty() || &matches.free[0][..] == "-" {
-        BufReader::new(Box::new(stdin()) as Box<Read>)
+        std_input = stdin();
+        Box::new(std_input.lock()) as Box<Read>
     } else {
         let path = Path::new(matches.free[0].as_str());
         let file_buf = safe_unwrap!(File::open(&path));
-        BufReader::new(Box::new(file_buf) as Box<Read>)
+        Box::new(BufReader::new(file_buf))
     };

This compiles, but this is ugly (the stupid std_input upfront declaration).

Ugly is one thing, losing the fight with the borrow checker is another.
Take cat.rs. It does not explicitely lock either (but it should). But then if you do:

     if path == "-" {
         let stdin = stdin();
         return Ok(InputHandle {
-            reader: Box::new(stdin) as Box<Read>,
+            reader: Box::new(stdin.lock()) as Box<Read>,
             is_interactive: is_stdin_interactive(),
         });
     }

The compiler is not happy, and I don’t know to please it.

Maybe it’d be better just to wait for non-lexical lifetimes to be implemented?
See rust-lang/rust#33520 rust-lang/rfcs#811 rust-lang/rust#43234

Please let me know what you think.

stdin() is already buffered.

stdin().read_line() calls stdin().lock() behind the hood (see
https://doc.rust-lang.org/src/std/io/stdio.rs.html#274)
Here we are reading user input, prompting them to confirm their action:
it seems useless to handle mutex locking/unlocking explicitely and
beforehand to avoid its overhead.

This commit is related to issue #1103.
stdin() is already buffered.

Locking upfront explicitely avoid the overhead of mutex locking/unlocking
everytime a new line is read. See https://stackoverflow.com/a/17546731
and https://www.reddit.com/r/rust/comments/3rj54u/how_can_i_read_char_by_char_from_stdin/cwpojn1/

The code cannot be simplified to “for line in stdin().lock().lines()”
until non-lexical lifetimes are implemented. The compiler complains at
the moment about a temporary value not living long enough/being dropped
while still borrowed. See rust-lang/rust#33520

This commit is related to issue #1103.
@Arcterus
Copy link
Collaborator

Arcterus commented Feb 7, 2019

One way to fix the base32 issue would be to make a generic function that takes something implementing Read and then just moving the rest of the stuff in uumain() into there. Something like:

fn uumain(/* args */) -> /* return */ {
    // ... snip ...

    if matches.free.is_empty() || &matches.free[0][..] == "-" {
        let stdin_raw = stdin();
        do_stuff(stdin_raw.lock());
    } else {
        let path = Path::new(matches.free[0].as_str());
        let file_buf = safe_unwrap!(File::open(&path));
        do_stuff(BufReader::new(file_buf));
    }

    0
}

fn do_stuff<R: Read>(input: R) {
    // blah
}

Another way is what you suggested, which is also fine but requires allocation (the current code does too).

IIRC the problem with cat is that the open() stuff needs to be restructured to avoid the Box (if we can avoid the Box we can also fairly easily get it to work with locked stdin I believe). You could maybe store the result of stdin() in the InputHandle struct like Option<Box<Stdin>> or something and then store the locked version of stdin in the struct as well, but that needs two allocations and is kind of hacky.

By the way, the issue you referenced will get closed if I merged this as-is, because you wrote "fixes" right next to the issue number, so it'd probably be better if you change the wording.

@PaulCapron
Copy link
Contributor Author

I have edited the original PR message so that you can merge it as-is, without closing #1103.
I won’t go further to resolve #1103.

You are right, in base32.rs, one can do:

    let is_decode = matches.opt_present("decode");
    let ignore_garbage = matches.opt_present("ignore-garbage");

    []

     if matches.free.is_empty() || &matches.free[0][..] == "-" {
        let stdin = stdin();
        do_it(stdin.lock(), is_decode, ignore_garbage, line_wrap);
    } else {
        let path = Path::new(matches.free[0].as_str());
        let file_buf = safe_unwrap!(File::open(&path));
        do_it(BufReader::new(file_buf), is_decode, ignore_garbage, line_wrap);
    };

    []

fn do_it<R: Read>(input: R, is_decode: bool, ignore_garbage: bool, line_wrap: usize) {
    let mut data = Data::new(input, Format::Base32);

    if is_decode {
        match data.ignore_garbage(ignore_garbage).decode() {
            Ok(s) => print!("{}", String::from_utf8(s).unwrap()),
            Err(_) => crash!(1, "invalid input"),
        }
    } else {
        wrap_print(line_wrap, data.encode());
    }
}

But then the same thing would have to be repeated in base64.rs. Or refactoring would have to take place.

The code of uucore/encoding.rs is only used by base{32,64}.rs, so refactoring sounds feasible. But investigating a little, several problems surface:

  • ignore_garbage is only used when decoding. On the other hand, line_wrap is only used when encoding. But encoding.rs forces them both to be filled, in its Data structure, no matter which “conversion” is going to take place.

  • Data::new() initializes such member to default values, only to have base{32,64}.rs overwrites them later (when they call Data::ignore_garbage() & Data::line_wrap()).

  • there is duplication of the 76 default value for the line wrap. There is also duplication of line_wrap in wrap_print().

  • etc.

uucore/encoding.rs goal, I guess, is to avoid DRY. Only it fails at that goal, and by the way makes execution inefficient and the codeflow hard-to-understand. At this point, I’m fed up. I am not willing to go further into this codebase conundrum.

@Arcterus Arcterus merged commit 39b5760 into uutils:master Feb 21, 2019
@Arcterus
Copy link
Collaborator

I've combined the base32 and base64 code. In terms of your first two points, any effect on performance would be negligible, and I assume the compiler will be able to optimize the default values out as they are never used. The third point is reasonable as 76 should not be in both encoding.rs and the base32/base64 main functions. The duplication of line_wrap is not much of an issue either, but I changed it anyway.

@PaulCapron
Copy link
Contributor Author

Oh, great! Thank you from all programmers that will have to touch this code in the future 😃

The main point was that the code was hard to follow. Performance was a minor concern, more like a side-effect (“the key to performance is elegance”).

Though if uutils/coreutils seriously aims at replacing the default coreutils, then efficiency is going to be of prime importance. Who is going to run a “memory-safe” base64/wc/&c. if it takes 30% more time (and/or 30% more memory) than the theorically unsafe, but battle-tested, alternative? Not me, at least. I already suffer way too much from software getting slower more rapidly than hardware gets faster.

I assume the compiler will be able to optimize the default values out as they are never used.

A quick look at the disassembly (cargo build --release -p base32 && objdump -M intel -d target/release/base32) validates your assumption 👍

But assuming that, in general, the compiler is sufficiently smart is hazardous. See e.g. http://proebsting.cs.arizona.edu/law.html https://cr.yp.to/talks/2015.04.16/slides-djb-20150416-a4.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants