Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement --sort-files option #263

Closed
daxim opened this issue Dec 1, 2016 · 5 comments
Closed

implement --sort-files option #263

daxim opened this issue Dec 1, 2016 · 5 comments
Labels
question An issue that is lacking clarity on one or more points.

Comments

@daxim
Copy link

daxim commented Dec 1, 2016

Ack has it. I always want to receive sorted output because I then can easily cross-check by eye-balling and compare the list of search results with the output of ls or tree, which are also sorted.

@BurntSushi
Copy link
Owner

This is related to #152 but it subtly different. #152 is asking for deterministic output and you're asking for sorted output. What do you want to sort by? Should it be customizable? What do you hope for this to achieve that, say, rg foo | sort does not? Is it acceptable to lose parallelism? (I hope so, because doing this while retaining parallelism seems hard.)

@BurntSushi BurntSushi added the question An issue that is lacking clarity on one or more points. label Dec 1, 2016
@nerdrew
Copy link

nerdrew commented Dec 2, 2016

@BurntSushi Hmmm. My bad. I don't care about sorting as much as I care about grouping by directory. So #152 seems more like what I'm looking for.

E.g.

dir1/file1.rs:blah
dir2/file4.rs:blah
dir1/file2.rs:blah

vs

dir1/file1.rs:blah
dir1/file2.rs:blah
dir2/file4.rs:blah

Totally understand the performance penalty for grouping results. Is it feasible to use the parallel runners to do the searching and aggregate the results afterward?

@BurntSushi
Copy link
Owner

@nerdrew Well, I mean, yes, that's what a hypothetical solution would have to do. But now you've introduced a cost: extra memory use. There might be extra time cost too, for having to do the aggregate, but it could be immeasurable.

Of course, that might have been feasible in 0.2.x, but 0.3.x introduced a parallel directory iterator so that actually crawling through the directories themselves is parallelized. Making that do aggregation (and importantly, knowing when an aggregation is complete) seems hard.

I would say that there's basically two options here:

  1. Deal with -j1 if you want determinism.
  2. Implement a new parallel searcher that's similar to the one in 0.2.x, but add aggregation. (This wouldn't be that hard.)

@daxim
Copy link
Author

daxim commented Dec 2, 2016

What do you want to sort by?

By the relative path name of the matching files; treat it as a string. Directory depth does not matter.

Should it be customizable?

No, it should simply follow LC_COLLATE.

What do you hope for this to achieve that, say, rg foo | sort does not?

The output of normal ack --sort-files on an interactive tty is human-readable with its path name headings above the matching lines, each group visually separated from each other, line numbers and colours. But when rg is piped into something, it's not human-readable any more. The path names are smushed together with the matching lines, there are no double line feeds to create paragraphs, line numbers are lost and results are not highlighted any more.

Is it acceptable to lose parallelism?

Yes.

@BurntSushi
Copy link
Owner

If --sort-files can imply -j1, then I think this is a relatively straight-forward thing to implement. It involves calling WalkDir::sort_by from the the single threaded ignore::Walk iterator, which in turn requires exposing a sort_by method on `ignore::WalkBuilder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question An issue that is lacking clarity on one or more points.
Projects
None yet
Development

No branches or pull requests

3 participants