-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make --glob respect ignore files with an option (feature request) #1808
Comments
If you can see yourself accepting a pull request for this, I'd probably enjoy implementing it. |
I've given this some thought. First, some details to clarify. Using the
Notice here that When I encoded this logic, my thinking was that if you specified a glob at the CLI, then that should be given the highest priority. But even this appears inconsistent. Namely, I suspect when I encoded this logic I was also thinking about blacklist globs. Namely, if we moved all CLI globs down to the lowest priority, then any whitelist glob in a gitignore file is going to override that and prevent the CLI blacklist glob from working as one might expect. For example, let's say I add a
That is, ignore all files ending in
If we make all CLI globs lower precedence, then in this example, the whitelist glob in the So it does seem like your insight here is spot on: whitelist globs should get lower precedence, but blacklist globs should get higher precedence. I think I agree with this request. And in particular, I'd like to see this fixed rather than worked around with another flag like It would be useful to think about other cases that might be changed by this behavior. Implementing this could also be tricky. I don't think the code is really setup to give different precedence to whitelist and blacklist globs. But maybe it's as simple as partitioning them... But if you do that, you now lose the fact that whitelist and blacklist globs can interleave with each other. Namely, So I'm not quite sure what to do here. Sorry. |
Sorry it took me a while to respond, and thank you very much for such a nice reply!
Yes, you are right, that's an excellent example. You forgot to mention that
This is a bit weird indeed. I think you mentioned in #1654 (or elsewhere?) that it's a performance optimization. It is a good advertisement for my proposal, since this particular weirdness would be gone with the weak glob behavior. I just hope I haven't missed some new inconsistency that may appear because of weaker globs.
Yes, that's exactly what I meant to avoid.
😊 🐶 😺
The purist in me is overjoyed to hear it, though on my own I'd be hesitant to go that far immediately. Do you think there doesn't even need to be a transition period?
:)
I think that description is right. For the cases I thought of so far, the patch below seems like its behavior should match this as well.
Yes, and I don't claim to have thought it through exhaustively yet. I tried the patch below, and was surprised that it didn't seem to break any tests immediately. I'm not sure if I ran them correctly. The next thing I'd do is to try to write some tests it does change. It might also be nice to go through various glob-related issues in this repository, and see whether they'd become better or worse. I'm not sure exactly when I'll be able to do it, though. My free time is somewhat unpredictable at this point.
I'm not sure I've accounted for everything, but when I looked, it seemed to me that the code is set up perfectly for this change. Ignoring tests, documentation, and any command-line flag, the change may be just a line here: diff --git a/crates/ignore/src/dir.rs b/crates/ignore/src/dir.rs
index 296c803..c6d2064 100644
--- a/crates/ignore/src/dir.rs
+++ b/crates/ignore/src/dir.rs
@@ -365,13 +365,13 @@ impl Ignore {
if !self.0.overrides.is_empty() {
let mat = self
.0
.overrides
.matched(path, is_dir)
.map(IgnoreMatch::overrides);
- if !mat.is_none() {
+ if mat.is_ignore() {
return mat;
}
} I think the overrides here are the list of all the globs, positive and negative. If a negative globs matched the file in question, TODO: I haven't yet thought through whether this does the right thing for directories. TODO 2: (Later edit: I don't think this matters very much. The question of whether everything one would want can be accomplished with a one-line diff is not that important, after all.) Now that I look at it again, a week later, I'm not sure when line #390 may be hit (is it ever?), and whether that still works correctly. I'm also not exactly sure where the hidden files are checked. But then it's getting late, and I'm getting tired.
I need to write some tests. I think the diff I have above may already work correctly in this case. As far as I can tell, all the globs (positive and negative) are checked together, as a group, at the |
Here is a way to think of the weak globs behavior that makes it conceptually simpler than the current behavior. Hopefully, it will also make it easier to reason about whether weak globs are a good idea. Conceptual frameworkWith weakglobs, there is only one mechanism in There are at least two
Each of these filters has some complex internal logic (ignores with This also explains why, I think, the code is set up perfectly for this: the ignore engines are already there. They just have three outcomes (whitelist, ignore, neutral) instead of two (neutral, ignore). This is useful to support their internal logic, but it shouldn't be hard to make sure that 'whitelist' and 'neutral' behave the same way in the function that I referenced in my previous comment. Unresolved issuesAt the moment, I can only think of one: what to do with
At the moment, I don't know whether the current behavior matches either of these and, if not, whether that's desirable. Can you think of any other Use-casesI've thought of a couple of use-cases. Let's take the file structure in the beginning of your comment #1808 (comment), and let's say 1. Mostly happy exampleLet's say I want to find all the With current globs, there is also the option of 2. Less happyWith weak-globs, it becomes difficult to reproduce your example of 3. Very happy example(For completeness) my original example of wanting to find all the I think it's now pretty straightforward for me to reason about what the weak glob behavior can and cannot do. I still plan to try to learn a little about how people use |
Resolving the
|
# | Behavior | Invocations that show text files | Implementation difficulty |
---|---|---|---|
1. | --type is a separate filter |
None of them |
Easy |
2. | Order matters | b. and c. |
Possibly hard |
3. | Current behavior | c. and d. |
Easy |
The reason I call option 1 "not as useful" is that switching to that behavior would remove the possibility of using --type lisp -g '!*.el'
to search lisp files that aren't Emacs-lisp. (Update 2021-03-20: I think this paragraph is wrong; it now seems to me that this example would work great with option 1. I'm no longer certain option 1 is less useful; I should think of more examples.)
Thoughts on option #3 (current behavior)
I don't think #3 makes quite as much sense as #2, especially in light of the rest of the proposal. It will be harder to document and explain. Ultimately, however, it does make sense; --type
and --glob
would be parts of the same filter, just as in #2. Moreover, with option #3, I'm pretty sure the rest of the proposal can be implemented with all the significant changes limited to that one function in the ignore
crate of ripgrep
.
In this version of the larger proposal, globs remain stronger than --type
. So, we need a better name for the proposal than "weak globs". The best I've come up with so far is "globs respect ignores".
Still, I think behavior #2 is better in terms of being slightly more useful (e.g. -g 'README*' --type-not md
would work; it doesn't now) and a lot easier to explain. Regardless of whether the "globs respect ignores" proposal is implemented, at some point changing --type
to behavior #2 would be an improvement, I think.
As a transitioning ag->rg user, I have a related feature request, and found this issue before opening a new one. I'm trying to find the right invocation for "search <type> files that also contain <pattern> in their name."
Is this something that could be considered? * Edit: Combining file types with |
@aswild I'm not the decider here, but your comment gave me some food for thought. What you are asking for is how the "Behavior 1: My feeling was that The behavior you suggest would be more important for something like |
I accidentally lied about how In my mental model I tend to think of file types as a "baseline" that are OR'd together, and then additional manual filters like globs are AND'd on top of that. But the specific semantics definitely get complicated and there's cases where that doesn't really make sense. I used rust as an example off the top of my head, but it's true this applies more for file types that have multiple extensions. I don't really have a good holistic suggestion for how all these filter complexities should interact, so these are just my two cents; I appreciate the thought and effort being put in here. |
I think the
--glob
and--type
options would be more useful if they respected ignored files. If possible, I would like there to be an option to make them behave that way.Currently,
rg --glob '*text*'
is equivalent torg --glob '*text*' -uuu
in every way (as far as I can tell). This is annoying, for example, in the following example (simplified a little from what I encountered):I'd like there to be an option
--weakglobs
that would make inclusive globs (see below) and--type
have a lesser priority than ignore files. I would put it into my config file. It would work as follows:I think this behavior is strictly more capable than the current behavior, since
rg --weakglobs --files --glob 'README*' -uuu
is equivalent torg --files --glob 'README*'
.There is a caveat: the excluding globs (the ones with
!
) already work as they should (for instance,-u
flags are not useless in the presence of-g !something
). Their behavior should not change. In other words, they should continue to have priority over ignore files: if a gitignore file has a line like!file
, it should still be possible to exclude it from ripgrep with-g !file
.So, the documentation would be something like:
P.S. Thank you for such a wonderful piece of software!
The text was updated successfully, but these errors were encountered: