-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
-J 1 gives the same result as - J 16 #4
Comments
Performance counter stats for 'ugrep -J 16 -z -F 10.0.0.1 ./2019-01-1/2019-01-1.conn.1.log.gz ./2019-01-1/2019-01-1.conn.2.log.gz':
|
It does spawn more threads but it never uses above 100% accumulated over all the threads, so it seems artificially capped? So utilization of resources available is very low. Now testing on the same unzipped files, and the case is the same. Testing with -J 12 its clear that it spreads load across the first 7 thread, totaling 100%, with the last 5 being absolutely idle. Tested with 1000 unzipped testfiles of the same type. perf stat ugrep -J 12 -F 10.0.0.1 ./rbergh/dir/* | wc -l 258711.305195 task-clock (msec) # 1.004 CPUs utilized
For comparisson:
500500000 find ./rbergh/dir/* -name '*' -print0 |perf stat xargs -P12 -0 -I {} sh -c 'cat {}| gre Performance counter stats for 'xargs -P12 -0 -I {} sh -c cat {}| grep -F 10.0.0.1':
1,263,417,643,081 cycles # 2.268 GHz (50.81%)
500500000 |
Firstly, the timings with Secondly, options such as In fact, your example searches just two files with a lot of hits (all lines are a hit!). Now, with the updated ugrep 1.5.3 I get the expected linear speedups for
With the older 1.5.1, when you used ugrep to output all matches in (almost) all files as in your example, the results were more or less sequentialized due to threads taking turns to output results. The updated ugrep 1.5.3 output buffering strategy squeezes more performance out of parallel searches. Comparing this to ripgrep there isn't much of a difference:
As I've stated in the README, the algorithm in Note: edited to include improvements in ugrep 1.5.3 |
The updated ugrep 1.5.4 is much faster due to fixing unbuffered reads from pipes and other improvements:
With one thread (
Using
where Note: ugrep options
and with
Now the ugrep
Because all lines of the input files match, your suggested tests are a significant outlier compared to the usual queries that return more sparse results. |
I will close this issue since ugrep option I have further updates in the pipeline (available soon) that further improve the speed of ugrep, including searching with the default options (default is line-by-line output), making it about as fast as |
@genivia-inc Whats the status on making line-by-line matchingas fast as multiline search with -o ? Because -o is insanely fast, but i need the entire matching line. |
Will have this soon. My thoughts on this: I already updated RE/flex to version 1.5.0. That version returns the line of a match with the new |
Awesome, looking forward to that. Not really an issue for my main case with the terminating \0 as mmapeing files isnt that important for me with a larger number of smaller decompressed files. But its something to be aware of for sure Very interesting. And agree on all points. I have to say, -o is insanely effective, an easy 4x on performance on my system with smaller sets and somehow it manages a higher consistent throughput on the IO, at least it looks that way. Testing on a larger dataset now, i see lower cpu utilization, a bit weird since its about half the usage, but i think that could be because ugrepis able to use the data so efficient, and as its IO bound its currently limited to my IO bandwidth. Its only using 10 1000% as oposed to 2500% without -o. |
Now it spiked up to the expected area, i think it just has something to do with the files in a certain range of my data, not an error or anything. |
Thanks for the feedback, that's good news. There is a trick that may get you even better performance with |
Here are some initial results of the line matching improvements on my machine with a prototype implementation, just to get an idea what the upcoming version 1.5.8 can achieve or better. With option
Without option
With option
As you can see, the I am not yet happy with the results for the second run. The performance without option The way the matching works in 1.5.7 and later differs from previous versions (a RE/flex 1.5.0 update). Instead of shifting the input buffer up to the match when more input is read, the buffer shifts up to the start of the line. The input buffer is used when It would be advantageous to have a look-ahead mechanism that would tell us if the next match is on the same line. This would remove the need to keep the rest of the line in the The cost to keep the |
Very very good, and an easy 70% performance increase here! Which makes it over 10x as fast as find, xargs and zgrep. Well done sir! Is it possible to output the filename at the start of each line as well or would that significantly slow it down because of introducing some of the output checks etc that were removed to make it faster? Or would it slow it down because of the need to store the current file in the buffer? Should not really be an issue as i suppose it keeps track of which file its at, but its more overhead the way i see it, because it cannot simply push the current line to the output, it has to fetch the filename from the buffer as well. But how much would it matter? Not sure. Not really a very big of an issue, but it makes it going through logs, knowing what logfile the hit is from. |
I assume you're referring to the The |
Thanks a ton, thats perfect, i wasnt aware of how powerful that option was! |
I've updated support for compression with task parallelism. I also added support for other compression formats (if the libbzip2 and liblzma libs are installed). This 1.6.0 update speeds up single file search (and searching a few files) by searching and decompressing a file concurrently with option For option For example, the processor utilization is 120% when searching a single medium-large compressed file (uncompressed 34MB):
Actually, the search engine runs faster than the decompression thread. If we have more search hits the processor utilization goes up but the wall clock execution time stays the same:
This means that the search time spent is completely hidden as decompression dominates the parallel execution time. We see this even more clearly with the following experiment that uses bzip2 instead of gzip (libz):
The bzip2 format is expensive to decompress. The xz (lzma) format is not too bad compared to bzip2 but 2x to 3x slower than gzip decompression for this experiment:
Because option
I did not expect this to be this fast. It is a nice bonus when using option When many files are concurrently searched with option Concurrent decompression and (tar) search is enabled with |
Impressive work! And thanks for an amazing writeup, again! This is already my go to search-tool, and it just keeps on getting better. Well done sir. |
Good news! A quick update on ugrep's new performance enhancements such as AVX/SSE2 (v1.7.8), making ugrep even faster (on my machines) for your benchmark: Matching 500 million lines in 1000 compressed files with
Matching 500 million lines in 1000 compressed files with
Counting 500 million lines in 1000 compressed files with
CPU utilization shows almost perfect speedup (8 cores):
|
Awesome! Thats well done, i will test this as well asap!
…On Tue, 11 Feb 2020 at 17:47, Robert van Engelen ***@***.***> wrote:
Good news! A quick update on ugrep's new performance enhancements such as
AVX/SSE2 (v1.7.8), making ugrep even faster (on my machines) for your
benchmark:
Matching 500 million lines in 1000 compressed files with -z and -o (down
from 31.5s to 22.5s):
$ time ugrep -rzoF 10.0.0.1 rbergh/2019-01-1 | wc -l
22.55 real 94.42 user 13.05 sys
500000000
Matching 500 million lines in 1000 compressed files with -z and
--format='%o%~' (down from 25.4s to 10.5s):
$ time ugrep -rzF --format='%o%~' 10.0.0.1 rbergh/2019-01-1 | wc -l
10.49 real 57.14 user 8.23 sys
500000000
Counting 500 million lines in 1000 compressed files with -z and -c:
$ time ugrep -rzcF 10.0.0.1 rbergh/2019-01-1 | wc -l
6.51 real 44.73 user 6.85 sys
1000
CPU utilization shows almost perfect speedup (8 cores):
43.656u 6.927s 0:06.94 728.6% 0+0k 0+0io 0pf+0w
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#4?email_source=notifications&email_token=AGJIWV5JGYUFEGSYKWMQCE3RCLJDZA5CNFSM4JAQJTMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELNEZTI#issuecomment-584731853>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGJIWV3FQBLZSVOGYJMW4N3RCLJDZANCNFSM4JAQJTMA>
.
|
Ubuntu 1604
Installed as per your latest guide.
time rg -j 1 -z -F 10.0.0.1 ./2019-01-1/2019-01-1.conn.[1,2].log.gz | wc -l
1000000
real 0m0.523s
user 0m0.536s
sys 0m0.428s
time rg -j 16 -z -F 10.0.0.1 ./2019-01-1/2019-01-1.conn.[1,2].log.gz | wc -l
1000000
real 0m0.364s
user 0m0.480s
sys 0m0.520s
time ugrep -J 16 -r -z -F 10.0.0.1 ./2019-01-1/2019-01-1.conn.[1,2].log.gz | wc -l
1000000
real 0m2.267s
user 0m2.112s
sys 0m0.492s
time ugrep -J 1 -z -F 10.0.0.1 ./2019-01-1/2019-01-1.conn.[1,2].log.gz | wc -l
1000000
real 0m2.308s
user 0m2.128s
sys 0m0.560s
The text was updated successfully, but these errors were encountered: