-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nimgrep improvements 2 #15612
Nimgrep improvements 2 #15612
Conversation
75abe90
to
f7d6f57
Compare
The compilation failure with nim cpp test on MacOS is related to using terminalWidth:
I cannot reproduce this problem in MacOS Mojave Virtualbox VM. When I run |
No, adding -lc did not help :-( |
reproduced with mac OS X 10.15 Catalina. |
I checked the generated code, stdlib_terminal.cpp, and managed to boil the failure down to just 2 lines. It's either a bug in clang++ compiler or in the headers, the bug appears only when unistd.h is included BEFORE stdio.h (where ctermid is defined) This compiles: #include <stdio.h>
#include <unistd.h> // <- may comment this line, it doesn't matter
int main() {
ctermid(NULL);
} This does not: #include <unistd.h>
#include <stdio.h>
int main() {
ctermid(NULL);
} If I compile with -c and then look into the *.o file by |
6fc4193
to
33b6e6a
Compare
I only skimmed this patch and hope @timotheecour does reivew it with more care than I did. Oh and thanks for the awesome work! |
tools/nimgrep.nim
Outdated
--limit[:N], -m[:N] limit max width of lines from files by N characters (80) | ||
--fit calculate --limit from terminal width for every line | ||
--onlyAscii, -o use only printable ASCII Latin characters 0x20-0x7E | ||
(substitutions: 0 -> @, 1-0x1F -> A-_, 0x7F-0xFF -> !) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somewhat serious bug: $ nimgrep x bin/nim
hangs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot reproduce :-/
What's your OS? Could you compile with --gc:markandsweep (I don't quite trust my recent change to --gc:orc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repro:
I'm on OSX, using iterm
XDG_CONFIG_HOME= $nim_140_X c tools/nimgrep.nim # (XDG_CONFIG_HOME to isolate from my environment)
tools/nimgrep x $nim_140_X
adding --gc:markandsweep
or --gc:arc
makes no difference
- this hangs:
tools/nimgrep x $nim_140_X
- this does not hang:
tools/nimgrep x $nim_140_X | wc -l
- this does not hang:
tools/nimgrep --onlyAscii $nim_140_X | wc -l
this suggests that the culprit is binary data being printed to the terminal are being interpreted by the terminal, causing arbitrary behavior including hanging.
This should not be the default behavior (printing binary data to terminal can have potentially dangerous / malicious consequences), so I suggest making --onlyAscii:on
the default, and users that know what they're doing can us explicit --onlyAscii:off
to disable
We can bikeshed the --onlyAscii
name separately
including --includeFile, --excludeFile, --excludeDir
introduced in nimgrep improvements nim-lang#12779
668bb28
to
9ace3c1
Compare
@timotheecour , thank you for great review! I think it's finished.
|
LGTM, thanks for your patience!
ya, leave it as is; hopefully nim-lang/fusion#32 gets merged soon and then you can reuse it, simplifying code and avoiding multiple yield in the process, among other benefits.
ok to leave as TODO; for future reference: #15612 (comment) I'd also like something like BurntSushi/ripgrep#1727 but that can be done in future work |
* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements nim-lang#12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options nim-lang#15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations
* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements nim-lang#12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options nim-lang#15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations
* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements nim-lang#12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options nim-lang#15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations
* nimgrep: speed up by threads and Channels * nimgrep: add --bin, --text, --count options * nimgrep: add --sortTime option * allow Peg in all matches including --includeFile, --excludeFile, --excludeDir * add --match and --noMatch options * add --includeDir option * add --limit (-m) and --onlyAscii (-o) options * fix performance regression introduced in nimgrep improvements nim-lang#12779 * better error handling * add option --fit * fix groups in --replace * fix flushing, --replace, improve --count * use "." as the default directory, not full path * fix --fit for Windows * force target to C for macosx * validate non-negative int input for options nim-lang#15318 * switch nimgrep to using --gc:orc * address review: implement cropping in matches,... * implement stdin/pipe & revise --help * address stylistic review & add limitations
I'm using nimgrep a lot in my daily job, and I added a few useful things.
-
, which allows for using nimgrep in Unix filters.A note on thread performance: the speed up with regex is not very great, it's limited to factor 2.5—3. Basically after reaching -n:3 or -n:4 threads performance stops to increase. I debugged that with gdb, it seems that the program with large number of threads (e.g. -n:20) spends almost all its time in syscalls read, that is reading the files into the process memory. I don't know whether it's gotten memory-bound or Linux is unable to deliver file contents because of some internal locks. When using (slow) Peg pattern, performance increases nearly linearly: I got speed-up 5.5—5.8 on my 6-core Ryzen 3600 with -n:6 and -n:12.
With --gc:orc speed is about the same as with --gc:markandsweep. But with orc nimgrep consumes ~2 times less memory in multi-threaded mode: 170 MB with -n:6 on Nim repository, while it is 370 MB with --gc:markandsweep. (Measure by Linux utility /usr/bin/time --verbose).