Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

walk: Flush stdout in batches #1452

Merged
merged 1 commit into from
Dec 18, 2023
Merged

Conversation

tavianator
Copy link
Collaborator

The previous behaviour was designed to mimic the output buffering of
typical UNIX tools: line-buffered if stdout is a TTY, and fully-buffered
otherwise. More precicely, when printing to a terminal, fd would flush
explicitly after printing any buffered results, then flush after every
single result once streaming mode started. When not printing to a
terminal, fd never explicitly flushed, so writes would only happen as
the BufWriter filled up.

The new behaviour actually unifies the TTY and non-TTY cases: we flush
after printing the buffered results, then once we start streaming, we
flush after each batch, but only when the channel is empty. This
provides a good balance: if the channel is empty, the receiver thread
might as well flush before it goes to sleep waiting for more results.
If the channel is non-empty, we might as well process those results
before deciding to flush.

For TTYs, this should improve performance by consolidating write() calls
without sacrificing interactivity. For non-TTYs, we'll be flushing more
often, but only when the receiver would otherwise have nothing to do,
thus improving interactivity without sacrificing performance. This is
particularly handy when fd is piped into another command (such as head
or grep): with the old behaviour, fd could wait for the whole traversal
to finish before printing anything. With the new behaviour, fd will
print those results soon after they are received.

Fixes #1313.

The previous behaviour was designed to mimic the output buffering of
typical UNIX tools: line-buffered if stdout is a TTY, and fully-buffered
otherwise.  More precicely, when printing to a terminal, fd would flush
explicitly after printing any buffered results, then flush after every
single result once streaming mode started.  When not printing to a
terminal, fd never explicitly flushed, so writes would only happen as
the BufWriter filled up.

The new behaviour actually unifies the TTY and non-TTY cases: we flush
after printing the buffered results, then once we start streaming, we
flush after each batch, but *only when the channel is empty*.  This
provides a good balance: if the channel is empty, the receiver thread
might as well flush before it goes to sleep waiting for more results.
If the channel is non-empty, we might as well process those results
before deciding to flush.

For TTYs, this should improve performance by consolidating write() calls
without sacrificing interactivity.  For non-TTYs, we'll be flushing more
often, but only when the receiver would otherwise have nothing to do,
thus improving interactivity without sacrificing performance.  This is
particularly handy when fd is piped into another command (such as head
or grep): with the old behaviour, fd could wait for the whole traversal
to finish before printing anything.  With the new behaviour, fd will
print those results soon after they are received.

Fixes sharkdp#1313.
@tavianator
Copy link
Collaborator Author

Benchmarks:

First of all, here's the purpose of the PR:

tavianator@tachyon $ fd-master title_t00.mkv ~ | time head -n1
.../title_t00.mkv
head -n1  0.00s user 0.00s system 0% cpu 1.435 total
tavianator@tachyon $ fd-flush title_t00.mkv ~ | time head -n1
.../title_t00.mkv
head -n1  0.00s user 0.00s system 1% cpu 0.104 total

--output=null

Command Mean [ms] Min [ms] Max [ms] Relative
fd-master -u --search-path ~/code/bfs/bench/corpus/chromium 296.7 ± 5.2 288.2 305.5 1.00
fd-flush -u --search-path ~/code/bfs/bench/corpus/chromium 296.8 ± 6.8 287.1 307.9 1.00 ± 0.03

--output=pipe

Command Mean [ms] Min [ms] Max [ms] Relative
fd-master -u --search-path ~/code/bfs/bench/corpus/chromium 353.3 ± 19.3 320.7 381.1 1.01 ± 0.07
fd-flush -u --search-path ~/code/bfs/bench/corpus/chromium 351.1 ± 14.4 329.2 368.6 1.00

--output=inherit (aka a TTY)

I sent the output to an inactive tmux pane so it wouldn't depend on my actual terminal emulator or my ssh connection.

Command Mean [ms] Min [ms] Max [ms] Relative
fd-master -u --search-path ~/code/bfs/bench/corpus/rust 609.1 ± 12.6 592.5 635.8 1.04 ± 0.03
fd-flush -u --search-path ~/code/bfs/bench/corpus/rust 588.2 ± 14.9 559.3 604.3 1.00

@sharkdp sharkdp merged commit 16c2d1e into sharkdp:master Dec 18, 2023
15 checks passed
@sharkdp
Copy link
Owner

sharkdp commented Dec 18, 2023

Thank you very much!

@tavianator tavianator deleted the flush-batches branch December 18, 2023 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a streaming mode when running under pipe
2 participants