-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support searching compressed files using in-process decompression #225
Comments
I'm torn on this one for both pragmatic and philosophical reasons. Philosophically, this is making ripgrep do a lot more than just "open a file and search it." I don't actually believe that ripgrep should do everything, even though it already does a lot. Nevertheless, it's hard to argue with the usefulness of this feature. Practically speaking, there are two primary issues as I see it:
I think that if we were to do this, it is at least blocked on some initial implementation of #1, since supporting additional text encodings has a lot of overlap with decompressing files before searching in terms of making the core search routines work with it. With that said, I don't personally see myself working on this any time soon. |
Brainstorming here... Instead of Anyway, this could also be used to search Word, binary, XML, or other complicated formats; if for example you could turn the complicated format into plain-text format that |
As the maintainer, I'm not particularly interested in the plugin path, sorry. |
Ok, I see. Also, I don't think that Now, none of |
That might be true on single files (it really depends on what proportion of time is spent in decompression), but ripgrep can certainly get wins with parallelism.
Can we not use this as motivation for adding new features please? I ask this because it will always be true. For example, if you need a POSIX compliant search tool, then neither ripgrep nor ag will suffice. If you need backreferences or lookaround, then ripgrep won't work but With that said, it's certainly reasonable to expect some convergence of features. For example, I specifically built ripgrep so that folks could use it for both the "search a large repo of code" and "search very large files" use cases. I think my initial comment on this issue still stands: this feature is a possibility, but has a few philosophical and practical problems with it. |
I would expect at least basic unix stream compressors to be supported (.gz/.bz2/.xz). All editors decompress (and often recompress) those on the fly. Debian ships share/doc/* files in compressed form when doing so results in space savings. I couldn't do a quick search there, and searching through docs is something I do often. It makes perfect sense to have documentation compressed. Source is sometimes compressed too. Emacs compresses .el files by default on install as well. Ironically, using emacs ripgrep package I cannot search into an installed emacs lisp files ;) I expect a performance hit when searching through compressed files, so I don't think that's a problem. My main concern is that I could miss a match because one of the files has been compressed. For text data files, this happens frequently, especially in repositories that need to float over the network. |
@wavexx I don't think there's any question that this is a desirable feature. Thank you for sharing your use cases though, they're helpful. |
I'm troubleshooting a cups server issue (for 3 days now), and I wish I could use a feature like this. Some files within the cups installation are gzipped while others are not, and they're scattered all of the place. So far I've been doing Regarding the issues Andrew brought up earlier: 1) I'd rather not have ripgrep linked to C libs 👎 and 2) for the UX design, I'd just offer a I'm a nobody, but my vote is to freeze this issue until pure-Rust compression libs are available and then re-evaluate this idea at that point. |
On macOS, zgrep is the 2.5.1 BSD version. BSD grep is noticeably slower than GNU grep. I haven't found a way to easily install the GNU version so it would be nice to have a fast and convenient to grep directories of compressed log files. |
@rik I think you can install the GNU tools through |
Sorry, I should have mentioned that I've looked into Homebrew/homebrew-dupes. Yes you can do that but that only installs |
I use ag and sift for my everyday search, not rg just because it doesn't support gzip. This UX issue matters a lot when I do a lot of log search as a sysadmin. Generally log files are larger than code files. Now I have 3.6GB gzipped log files to search frequently, where the raw speed of a search tool really shines. As of my code, I don't really matter if it is ag, sift or rg, because every of them gives me the result fast enough. As a search utility focusing on speed, I think log file search should be its target use case, where your effort devoted really helps. Thanks! |
To follow up on my previous comment, and to make a recommendation, it would be nice if ripgrep would just use readily available stream compressors directly as a co-process instead of embedding a decompression library. Although for small files the fork might incur in some penalty, it's unlikely ripgrep will ever be faster than pbzip2, with the advantage that you gain instant access to all common unix formats at once (you only need an extension/compressor map). You could always specialize z/gz later on to gain advantage for small scattered files. |
@wavexx Please don't. Not everyone is on Linux etc. |
On Tue, Jul 04 2017, Laurentiu Nicola wrote:
@wavexx Please don't. Not everyone is on Linux etc.
My suggestion does not preclude an embedded library. But it does have a
massive advantage on unix systems. There are several variants of popular
stream compressors with varying tradeoffs of memory/speed.
|
@wavexx Thanks for the suggestion. I'm somewhat attracted to it. Since this ticket seems to be tracking general decompression support, I've created a more focused implementation specific ticket: #539. I'm not sure when I personally will be able to work on it, but I would be happy to mentor it. I think anyone with some Rust experience could probably do it! |
@BurntSushi This ticket and #539 have been stagnant for a while. Earlier in this thread you mentioned:
Yet, given how long this feature has been stalled, should you not just bite the bullet and pull in the libraries? This feature is a deal-breaker for myself, others that I know, and some in this very thread. It doesn't seem that keeping ripgrep "pure Rust" is worth the trouble at this point, esp. considering that this could have been implemented already. Just my 2¢. |
You aren't considering the maintenance burden of bringing in C code. I'm not inclined to change course at this time. |
@BurntSushi But it would be a small amount, no? Does Rust not support an FFI to C to ease this sort of thing? Also, as an aside, I believe that the "burden" of C is very much overstated these days. It's not as bad as most people make it sound... |
@BurntSushi Maybe soon: rust-lang/flate2-rs#67
You're probably right, but some people oppose to it on principle. Also, on Windows, it's still somewhat awkward at times (but those are bugs of course, and could/should be fixed). |
You're probably right, so why don't you just submit a pull request that adds the necessary libraries and glue code? |
I did not say it would be hard to implement. I said it would increase
*maintenance* costs. Right now, the build process is simple because
everything is in Rust.
I'm not opposed to this on principle. I'm on mobile so I don't have a link,
but someone has already tried to add support for this in a PR. There's some
discussion on that PR that I would encourage you to read.
…On Sep 24, 2017 3:12 AM, "Jordan Danford" ***@***.***> wrote:
@ylluminarious <https://github.com/ylluminarious>
But it would be a small amount, no?
You're probably right, so why don't you just submit a pull request that
adds the necessary libraries and glue code?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#225 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAb34s80eT9ArGfGN0c8wGrfkhwF3G4Bks5slgDkgaJpZM4KsvZm>
.
|
Let us please try to keep things friendly here. I would rather someone not put in the work to add this unless it has a good chance of getting merged (unless they are explicitly okay with doing it regardless of the result). |
As @BurntSushi mentioned, I do not want to put in time for such a feature unless it's likely to get accepted and merged. @lnicola Thanks a lot for sharing rust-lang/flate2-rs#67 -- intriguing stuff with obvious usefulness here. Also, yes, it is possible that things could be awkward on Windoze, but as you mention, hopefully that can be worked around / fixed. @BurntSushi Thanks for adding that extra information on #305 and for your thoughts on #539. I had presumed that adding support via C would be easier at this point than via Rust, but that seems incorrect now. |
Sorry for my snarky comment! It wasn't constructive, and it seems like everyone's on the same page now anyway. |
@jdanford No offense taken -- it was a valid question. |
You can set environment variable Notice |
For anyone watching this issue, the next release of ripgrep will have support for searching compressed files using the I am going to keep this issue open to track in-process decompression, since I suspect we will ultimately want to move to that. However, I don't expect that to happen any time soon. |
Does it work on windows too when 7z is installed? |
@BurntSushi Thanks for the news! |
@Boscop I don't know. You need the |
We don't need all of those programs for any compressed file, do we? I assume you mean that we need a respective decompression program for a given file type. |
@ylluminarious Yes. |
7z.exe supports all of those. |
As maintainer make this configurable can give you lot of headaches, can resist to propose: Line 37 in 597bf04
Line 53 in 597bf04
so you can config new files to decompress:
|
On second thought, I'm just going to close this issue, since there isn't much value in tracking it at this point. It will likely be a long time before in-process decompression happens. |
@BurntSushi You should probably update your Anti-Pitch for |
I'd like to use ripgrep for grepping log files, because it's faster then grep. But my logs are gzipped, and if I
zcat | rg
them I'll loose log filenames in output.Also, would be great if bzip2 and xz decompressors will be supported too with automatic archive type detection.
The text was updated successfully, but these errors were encountered: