-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filter out duplicate results #357
Comments
Unlikely. The reason is that determining which results are duplicates is actually quite hard despite the conceptual simplicity. (The file paths alone don't tell the whole story.) One possible work-around for you is to execute two searches:
The second command won't search |
If you're curious about what goes into determining whether two files are the same or not, you can check out the |
eg: using d syntax which i'm more familiar with: auto is_same_file(string a, string b){return realPath(a)==realPath(b);}
// realPath is a simple D wrapper around posix realpath from C
|
Computing the real path of every file is extremely expensive. This option is also not possible to implement in constant memory. It is a non starter. I'm not particularly interested in adding experimental half baked features. |
I think the answer here is to work around it. If you tell the tool to search the same directory twice, then it should show the same results twice. |
EDIT: actually since we're searching a file-system tree, I guess we can do better than O(N) memory by doing breadth first listing and sorting each directory so it'd be O(D) where D=max number of entries per directory; ignore last point. |
@BurntSushi I agree with your reasoning 100% on why rg shouldn't try to deduplicate two textually different paths, however I am wondering if there can be an option to prevent the following behavior:
without needing to pipe its output to another command to deal with (to avoid buffering and to be able to use |
I don't think it's ripgrep's responsibility to deduplicate the arguments you give it. If you don't want it searching the same directory twice, then don't give the same directory twice. i.e., Do something to deduplicate the arguments before handing them to ripgrep. |
That's certainly fair enough; I was just checking if you would be open to having this be an option within ripgrep itself. |
(I just saw your reply again by chance and it occurred to me to point out that, strictly speaking, the inputs to |
rg pattern dir1/dir2 dir1
return duplicate results
could we filter out duplicate results? (at least as an option)
the use case for
dir1/dir2 dir1
is to list relevant files indir1/dir2
ahead of the other ones indir1
The text was updated successfully, but these errors were encountered: