Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding multiple patterns #315

Closed
tjkirch opened this issue Jul 31, 2018 · 38 comments
Closed

Finding multiple patterns #315

tjkirch opened this issue Jul 31, 2018 · 38 comments

Comments

@tjkirch
Copy link

tjkirch commented Jul 31, 2018

I'd like to be able to search for multiple patterns, like with grep's -e argument. It seems (with fd 7.0.0) the only way is to use alternation in the regex pattern, but this can be less clear than multiple arguments, and is harder to build up programmatically.

@sharkdp
Copy link
Owner

sharkdp commented Aug 3, 2018

Thank you for the feedback!

I certainly see the need for this, but I'm not sure we should introduce a new command-line argument, given that there is a reasonable solution via fd '(pattern1|pattern2)'. On the other hand, the analogy to grep would be nice. Unfortunately, -e is already taken for --extension.

Another option to achieve something like this could be the --path-before-pattern flag that we were discussing over in #312. This would allow us to use fd --path-before-pattern . pattern1 pattern2 ... (possibly with a shortcut for the flag).

@sharkdp
Copy link
Owner

sharkdp commented Oct 27, 2018

Actually, the --path-before-pattern doesn't feel very natural for me. I'd be okay with adding a new --regexp <PATTERN> option in analogy to grep/rg, if someone wants to work on this.

@sharkdp
Copy link
Owner

sharkdp commented Apr 2, 2020

I'm currently not planning to implement this. Going to close this for now, but happy to reconsider if there is a significant interest in this.

@aschmolck
Copy link

How about

fd Makefile --or GNuMakefile --or make

?

That reads naturally, and it would make it possible to add a git grep or find style boolean query language at some point.

@sharkdp
Copy link
Owner

sharkdp commented Mar 14, 2021

Let's reopen this for further discussion.

@sharkdp sharkdp reopened this Mar 14, 2021
@aschmolck
Copy link

aschmolck commented Mar 17, 2021

Here is a concrete example I just did with find, I think it would be nice to be able to do the same thing with fd as well:

find . -type d -and \( -name node_modules -or -name build \) -exec rm -rf '{}' '+'

@sharkdp
Copy link
Owner

sharkdp commented May 16, 2021

I'm definitely against including a full-blown query language with --and/--or. fd was never designed to be this powerful. It's focused on easier use-cases.

Your use-case can be solved by running

fd -td '^(node_modules|build)$' -X rm -rf

or

fd -td node_modules -X rm -rf
fd -td build -X rm -rf

Both of which are shorter than the find equivalent (which is not the main issue here though).

@zw963
Copy link

zw963 commented Jul 8, 2021

I'm definitely against including a full-blown query language with --and/--or. fd was never designed to be this powerful. It's focused on easier use-cases.

Your use-case can be solved by running

fd -td '^(node_modules|build)$' -X rm -rf

or

fd -td node_modules -X rm -rf
fd -td build -X rm -rf

Both of which are shorter than the find equivalent (which is not the main issue here though).

But, maybe we need a non-regexp OR pattern, following is a example, i guess is not so simple to do with fd.

find . \
                -name "* (????-??-??) \[??:??:??\].tar" -o \
                -name "* (????-??-??) \[??:??:??\].bak"

Can we support like this:

fd -IH -g '* (????-??-??) [??:??:??].tar' -g '* (????-??-??) [??:??:??].bak'

@tmccombs
Copy link
Collaborator

tmccombs commented Jul 9, 2021

OR is actually pretty easy to do with regexes.

your example could be done with:

fd -IH '.* \(....-..-..\) \[..:..:..\]\.(tar|bak)'

AND is more difficult.

@zw963
Copy link

zw963 commented Jul 9, 2021

OR is actually pretty easy to do with regexes.

your example could be done with:

fd -IH '.* \(....-..-..\) \[..:..:..\]\.(tar|bak)'

AND is more difficult.

Yes, i done this like this:

fd -HI '.*(\d{4}-\d{2}-\d{2}) [\d{2}:\d{2}:\d{2}].(tar|bak)'

I think it more obscurely then find or solution anyway.

@minad
Copy link

minad commented Aug 7, 2021

@sharkdp I have a use case where it would be very useful if fd supported multiple patterns combined with AND. As you write in your #315 (comment), you don't want to add a full query language here which is understandable. The combination of multiple regular expressions with OR is no problem. However there is no possibility to search for multiple patterns combined with AND. The reason is also that the Rust regular expression engine does not support lookahead patterns, otherwise one could write ^(?=.*first)(?=.*second) to search for file names with both first and second in the name. Would you accept a PR which adds support for searching multiple patterns combined with AND?

@sharkdp
Copy link
Owner

sharkdp commented Oct 8, 2021

To be honest, I haven't really seen a reasonable use case for AND so far. Please let me know if there are any. Not a theoretical use case. A real world, practical use case.

@NightMachinery
Copy link

@sharkdp commented on Oct 9, 2021, 12:36 AM GMT+3:30:

To be honest, I haven't really seen a reasonable use case for AND so far. Please let me know if there are any. Not a theoretical use case. A real world, practical use case.

Searching for terms where one doesn't know their order. This happens frequently for me; E.g.,games AND windows, as I sometimes have games/Windows, and sometimes Windows/games.

@minad
Copy link

minad commented Oct 8, 2021

@sharkdp I have an Emacs file finder frontend which can use find or fd as backend. This frontend supports a matching style we call "orderless" matching, where you enter multiple words/regexps separated by space. Each of the file paths should match all of these regexps. Currently one can achieve this by transforming the regular expressions "word1.*word2|word2.*word1", which obviously does not scale well. Another alternative for AND filtering is to use pipes and run fd first and then grep for the remaining regexps (or instead of grep post-filter in the frontend), but then one loses the performance advantages of fd. The "orderless" style matching is quite popular in Emacs to quickly filter a set of candidates, since as @NightMachinary mentioned, the huge advantage is that the user does not have to know the order of the words/regexps. If this is a reasonable use case depends on your judgement of course. It seems to me that fd aims more at shell users. But I often get the request to support fd in the Emacs frontend by users who prefer fd instead of find for performance reasons.

@sharkdp
Copy link
Owner

sharkdp commented Nov 14, 2021

Ok, I'm inclined to accept a feature request to support --and <pattern>. Before we implement this, we need:

  • A short discussion about the command-line option name. What do other tools use?
  • A detailed analysis if this could clash with any of the other command-line options or features of fd. There are some immediate questions like: what does fd patternA --and patternB --type f mean? (we are not going to support the meaning patternA AND (patternB AND type==file)).

@zw963
Copy link

zw963 commented Nov 15, 2021

In fact, i thought most of discuss in this thread is about --or, that means, we can search multi-pattern at one command line more easiler.

@sharkdp
Copy link
Owner

sharkdp commented Nov 15, 2021

Note that there is also #650 and #714. Also, --or can usually be worked around easily.

@zw963
Copy link

zw963 commented Nov 15, 2021

I propose we can add --or for now, and let discuss the usage and necessity of --and.

@tmccombs
Copy link
Collaborator

--or isn't really necessary, because you can just use | in the pattern to combine multiple patterns. However, there isn't a good way to express --and with a single regex.

To be concrete, a hypothetical fd foo --or bar would be equivalent to fd 'foo|bar'. Whereas fd foo --and bar would need to be converted to something like fd 'foo.*bar|bar.*foo' which scales really poorly.

@zw963
Copy link

zw963 commented Nov 20, 2021

To be concrete, a hypothetical fd foo --or bar would be equivalent to fd 'foo|bar'

not equivalent.

Because we can use --or with glob-based search

@tavianator
Copy link
Collaborator

It's equivalent in the sense that every glob can be converted to a regex

@zw963
Copy link

zw963 commented Nov 21, 2021

It's equivalent in the sense that every glob can be converted to a regex

But in most simple case, glob-based search is more simple than regexp on keystroke

@grothesque
Copy link

grothesque commented Nov 22, 2021

If fd gets both --or and --and then it should also get --not and parens (users would certainly demand it). We would arrive at something similar to find in terms of complexity.

My understanding is that fd tries to be simpler than find (but at the same time as powerful as feasible). In that sense, I think it's not too much to ask the advanced user who needs --or to simply use regular expressions.

On the other hand, there is really no practical way to work around the lack of --and. Someone who wants to search the file system for three different tags in arbitrary order will have to run fd with a regular expression that combines the six possible permutations in one giant regex. (I wrote a wrapper script that allows me to run fd like this easily and I consider it extremely useful.)

In #889, I suggested that one could deprecate the specification of paths as arguments (as opposed to --search-path that I suggest to rename to --root and -r for brevity). This would eventually allow to specify multiple search patterns as args. Given that logical OR is already possible within a regex, it would make sense to apply logical AND when multiple patterns are given.

IMHO fd would thus gain a much nicer (cleaner and more powerful) UI.

@cheap-glitch
Copy link

This might be off-topic since it's not strictly about patterns per se, but here's a real-world use case for --or that can't be done via a regex:

I have some complex Bash projects with several different types of files (executable scripts, helpers, test modules, etc.) and I want to lint them all at once with shellcheck. I can't use plain globs because some files have no extensions and shellcheck will error when passed folder names.

This is what I'd like to do:

fd -t x --or -e bash --or -e bats -0 | xargs -0 -- shellcheck

This can be done with find, but without the benefits of automatic VCS exclusions:

find \( -type f -and -executable -or -name '*.bash' -or -name '*.bats' \) -print0 | xargs -0 -- shellcheck

@sharkdp
Copy link
Owner

sharkdp commented Jan 22, 2022

fd -t x --or -e bash --or -e bats -0 | xargs -0 -- shellcheck

You can already do this. -e already combines in a or-sense. In addition, you can use fds --exec/-x option instead of xargs. This will not be just shorter to write, but also faster, because it runs multiple shellcheck processes in parallel:

fd -tx -ebash -ebats -x shellcheck

@cheap-glitch
Copy link

cheap-glitch commented Jan 22, 2022

You can already do this. -e already combines in a or-sense

Yes, but -tx doesn't. To clarify, I want all the files that are executable OR end in .bash/.bats.

@tmccombs
Copy link
Collaborator

I think that kind of functionality is out of scope for fd, it would basically involve making an expression language similar to what find has, and make fd significantly more complicated.

@cheap-glitch
Copy link

I think that kind of functionality is out of scope for fd, it would basically involve making an expression language similar to what find has, and make fd significantly more complicated.

I totally understand not wanting to add that kind of complexity, but what about a simple global flag? (Sorry if this has already been proposed and rejected somewhere else).

It could be called e.g. --combine-with and take 3 possible values:

  • and to combine all filters with a logical AND
  • or to combine all filters with OR
  • auto to use the default "smart" combination logic (so the same as not passing the option at all)

This would probably be easier to implement, and while not as flexible as find's expressions, it would still enable more use cases.

@sharkdp
Copy link
Owner

sharkdp commented Jan 23, 2022

Thank you for your feedback, but I'm not a fan of the --combine-with idea. I'm not sure if that would really allow us to solve a lot of new real world use cases.

What would fd --combine-with=or -e txt -e md README do? Would it OR-combine ALL criteria? Including the pattern? So it would search for files with a txt extension, with a md extension OR for files matching README?


Another workaround for the OR use case is to simply use multiple fd commands:

(fd -t x -0; fd -e bash -e bats -0) | xargs -0 -- shellcheck

@097115
Copy link

097115 commented Jun 2, 2022

@sharkdp,

Also, --or can usually be worked around easily

Is there any way to search for directories, or files that match specific pattern?

If we search for ALL the files and directories, then, yes, fd . --type d --type f ~/Documents can do it. But if we want to get a list of all the directories AND all the .txt files, then, as soon as we add --extension, like fd . --type d --type f --extension txt ~/Documents, fd, as expected, will limit the results to files only. Same happens if we add --full-path, like this: fd --type d --type f --full-path '.*txt$' ~/Documents.

Of course, combining two different searches into one stream is not a problem. But why spawn two instances? :)

@grothesque
Copy link

I would like to reinforce the case for an AND operator as opposed to a full implementation of boolean logic (see my above comment):

I wrote a script (https://gitlab.kwant-project.org/-/snippets/903, consider it in the public domain) that uses fd as a backend to search for files/directories matching a combination of tags. The tags of each file/directory are obtained are obtained from the path by treating slashes and dashes as separators. For example, the file name “pers/2022/bike-repair.org” corresponds to the tags “pers”, “2022”, “bike”, “repair”, as well as “repair.org” (dots are optional tag separators).

Now searching for all events involving my friend “Bob” and the activity “climbing” is as quick as running ff bob climbing. (I like to define a short ff alias.) I also have a way to run this directly from within Emacs.

The purpose of this example is not to convince you to organize your home directory in a similar way (although I think that the scheme works very well), but to give one very concrete usage example of fd use where having a way to express an AND relation would be useful.

My script has a --debug option that instead of running fd will just print out the command. As one can imagine, the query length grows exponentially with the number of tags for which to search. Already with three tags it is getting pretty long (and presumably less efficient):

% fdfind-tags --debug a b c
fdfind --full-path --prune --regex '[-/](a)[-/](.*[-/])?(b)[-/](.*[-/])?(c)([-/]|(\.[^/]*)?$)|[-/](a)[-/](.*[-/])?(c)[-/](.*[-/])?(b)([-/]|(\.[^/]*)?$)|[-/](b)[-/](.*[-/])?(a)[-/](.*[-/])?(c)([-/]|(\.[^/]*)?$)|[-/](b)[-/](.*[-/])?(c)[-/](.*[-/])?(a)([-/]|(\.[^/]*)?$)|[-/](c)[-/](.*[-/])?(a)[-/](.*[-/])?(b)([-/]|(\.[^/]*)?$)|[-/](c)[-/](.*[-/])?(b)[-/](.*[-/])?(a)([-/]|(\.[^/]*)?$)'

@minad minad mentioned this issue Oct 13, 2022
20 tasks
@Uthar
Copy link
Contributor

Uthar commented Oct 15, 2022

I added multiple pattern finding: Uthar@19c2495

But it's much slower now, 10x. I'm not a Rust expert, maybe someone will help

@QuarticCat
Copy link

@Uthar I'd like to take a look at the performance problem. How did you benchmark it?

@zw963
Copy link

zw963 commented Oct 15, 2022

I added multiple pattern finding: Uthar/fd@19c2495

But it's much slower now, 10x. I'm not a Rust expert, maybe someone will help

Wow, 10x slow, is really not acceptable.

@Uthar
Copy link
Contributor

Uthar commented Oct 15, 2022

I'd like to take a look at the performance problem. How did you benchmark it?

Thank you.
I compared these commands:

# patched fd
time ./fd --pattern foo .

# upstream fd
time fd foo .

@Uthar
Copy link
Contributor

Uthar commented Oct 15, 2022

But it's much slower now, 10x. I'm not a Rust expert, maybe someone will help

Ah... I think I was compiling with cargo build instead of cargo build --release. With release mode the performance is the same as before

I will be using this. But what I did, adding the --pattern flag, is too much of a breaking change to make it public.

  1. Does anyone want this in upstream?
  2. How to add flags similiar to find -and ... -and .. -and ...

@tmccombs
Copy link
Collaborator

tmccombs commented Oct 16, 2022

Simply timing a single run also isn't very reliable for benchmarking. And if you just run the two commands one after another, the first one you run will probably be significantly slower than the second, because the os will probably cache data from the first run and have it available for the second run.

https://github.com/sharkdp/fd-benchmarks has some scripts to help benchmark fd with hyperfine.

@sharkdp
Copy link
Owner

sharkdp commented Nov 21, 2022

closed via #1139

@sharkdp sharkdp closed this as completed Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests