Improve startup time #1408

sharkdp · 2023-10-23T22:10:00Z

fds startup time is quite slow. On my 12 core system, it takes ~ 20 ms for "searching" an empty folder. This is fast enough not to be noticeable by humans, but it looks bad in benchmarks when comparing fd with other tools on small folders ¹. And it's also an actual problem for use cases where fd is called repeatedly from a script.

Some of that overhead is caused by the spawning of threads, and that problem is already tracked in #1203. But I think there is more that can be done. Instead of using my usual go-to performance tool (perf), let's look at the magic-trace output of a fd call in an empty folder ². If someone is interested, I've attached the trace to this post. Go to https://magic-trace.org/ to load it in their viewer.

The full trace looks like this:

The first 2.2 ms are typical process startup things (before main). I don't think there is any room for optimization here (?)

The next ~2 ms are more interesting:

some notable steps (even if insignificant in time)

parsing command-line arguments (724 µs)
a "is_existing_directory" check on the search path (28 µs)
the parsing of the (empty) search pattern regex (32 µs)
isatty check (5 ·s)
LsColors::from_env (579 µs)
num_cpus::linux::get_num_cpus` (441 µs)
RegexBuilder::build (171 µs)

Some things were surprising to me. I didn't expect the get_num_cpus call to take this long. There might be some room for improvement here by doing things in parallel (e.g. LsColors::from_env)? But only if the thread overhead is not too high.

Then we start the actual scan, which takes the majority of the time:

Here, I'm not so sure how to interpret the trace, as things are actually happening on multiple threads. But we can (presumably) see some of the thread spawning/joining time here (~ 5 ms):

and some gitignore matcher logic going on here (370 µs total):

Most of the time is actually unaccounted for in the trace, because I can only see:

We can see a bit more when switching off LTO:

Apparently, 11 ms are spent in crossbeam_channel::channel::bounded's from_iter method? (probably the receive call?) — even though we don't have any work to do. On a -j1 run, this part only takes 1 ms.

Those "small" folders can be pretty large, actually. It takes hundreds of thousands of files before we can make up for the startup "penalty". ↩
I recently discovered this and used it successfully to benchmark (and then optimize) the startup time of other programs. ↩

The text was updated successfully, but these errors were encountered:

We originally switched to bounded channels for backpressure to fix sharkdp#918. However, bounded channels have a significant initialization overhead as they pre-allocate a fixed-size buffer for the messages. This implementation uses a different backpressure strategy: each thread gets a limited-size pool of WorkerResults. When the size limit is hit, the sender thread has to wait for the receiver thread to handle a result from that pool and recycle it. Inspired by [snmalloc], results are recycled by sending the boxed result over a channel back to the thread that allocated it. By allocating and freeing each WorkerResult from the same thread, allocator contention is reduced dramatically. And since we now pass results by pointer instead of by value, message passing overhead is reduced as well. Fixes sharkdp#1408. [snmalloc]: https://github.com/microsoft/snmalloc

tmccombs · 2023-11-04T05:35:34Z

11 ms are spent in crossbeam_channel::channel::bounded's from_iter method?

That might be where crossbeam_channel initializes the memory for the channel
here.

sharkdp added the performance label Oct 23, 2023

tavianator mentioned this issue Oct 30, 2023

walk: Use unbounded channels #1414

Closed

tavianator mentioned this issue Nov 5, 2023

walk: Send WorkerResults in batches #1422

Merged

tavianator closed this as completed in #1422 Nov 29, 2023

tavianator mentioned this issue Sep 16, 2024

[BUG] 🐌 fd can be much slower than GNU find in some cases #1614

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve startup time #1408

Improve startup time #1408

sharkdp commented Oct 23, 2023 •

edited

Loading

tmccombs commented Nov 4, 2023

Improve startup time #1408

Improve startup time #1408

Comments

sharkdp commented Oct 23, 2023 • edited Loading

Footnotes

tmccombs commented Nov 4, 2023

sharkdp commented Oct 23, 2023 •

edited

Loading