Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow ** wildcards in globs. #5284

Merged
merged 2 commits into from
May 8, 2018
Merged

Conversation

quasicomputational
Copy link
Contributor

These are inspired by a plan described in a comment in #2522, and only
implement a quite limited form of recursive matching: only a single **
wildcard is accepted, it must be the final directory, and, if a **
wildcard is present, the file name must include a wildcard.

Or-patterns are not implemented, for simplicity.

Closes #3178, #2030.

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary.
  • If the change is docs-only, [ci skip] is used to avoid triggering the build bots.

Includes tests!

match exactly, so ``*.gz`` matches ``foo.gz`` but not
``foo.tar.gz``. A wildcard that does not match any files is an
error.
``images`` directory. ``data-files: audio/**/*.mp3`` matches all
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the documentation should mention that ** is available starting with cabal-version:3.0 (if we assume that 3.0 will be the next major release)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also went ahead and changed all the '2.4' mentions to '3.0'.

@hvr
Copy link
Member

hvr commented Apr 25, 2018

I only gave this a cursory look, and so far this PR looks well made to me

@BardurArantsson
Copy link
Collaborator

Looks like a very well crafted PR, kudos.

What's the reasoning for the "must be the final directory bit"?

(I'm stating all of the following as my personal opinion on the matter. I have no real decision-making power on this.)

Generally, I'd object to making cabal's glob behavior even more distinct from the usual shell glob behavior. Yes, even if it is well documented as in this case. Most people don't actually read documentation and instead just copy examples and extrapolate from what the already know -- and then either give up or file issues when things don't work as they expect. Now, that is on them, but I predict that non-shell-like behavior (which even mimics the syntax of the shell!) will lead to more "support burden" for the cabal development team when issues/questions arise.

I'd like to propose an alternative solution which I don't believe was discussed before: How about just using regex matches against the 'normalized' project-relative path? I.e. the whole string must match) against the path (using '/' as separator for cross-platformness).

@23Skidoo
Copy link
Member

Looks nice on first glance, do you plan to implement or-patterns for file extensions as well (*.{html,jpg,css})? We might also want to change wildcards in 3.0 to make *.js catch *.min.js (but we must preserve the old semantics for cabal-version < 3.0 files).

@hvr
Copy link
Member

hvr commented Apr 26, 2018

@23Skidoo btw, you mentioning {/} reminds that we also need to provide a way to escape characters like * or {, }, or ,; do we have support for that already?

@23Skidoo
Copy link
Member

Not sure actually.

@quasicomputational
Copy link
Contributor Author

What's the reasoning for the "must be the final directory bit"?

It's forced by the data structure in the comment on #2522 by @23Skidoo, and it does slightly simplify implementation. This PR is targetting the absolute minimum possible version of this feature, to try and head off the bikeshedding that killed the previous one; after all, it's easier to lift restrictions in later versions than to add more. I'd rather see a limited patch merged and then iterated on rather than a more ambitious patch not get merged.

or-patterns

Not in this PR? I think globstar is worth having on its own merits, and that or-patterns can be separately implemented. They'd need new syntax and stealing { is a different beast to repurposing *: docs/**/*.html is currently a glob syntax error and will make Cabal 2.2 and earlier error out, but docs/{a,b,c}.html is a cromulent pattern today. I also have an efficiency concern about or-patterns if they're going to lead us to always walk the whole directory tree from the project root, including .git, dist-newstyle, etc: this isn't unfixable but it's more complication. (Also I need to fix that in this PR; more pushes incoming.)

We might also want to change wildcards in 3.0 to make *.js catch *.min.js (but we must preserve the old semantics for cabal-version < 3.0 files).

Agreed, but I think that's also separable work. A behaviour change to extension matching has different considerations (e.g., would cabal check need to walk the directory tree and look for things that might have been missed?). Globstar alone isn't actually a breaking change, again because ** is already rejected, so I don't think it makes sense to try to cram in more features on the basis that one big breaking change is better than many small ones (which might be valid thinking if there was a breaking change here).

escaping

Nope, no way today; I asked a few months ago in #4935. At least this PR doesn't make things worse, since * was already a magic character. We could go for backslash escaping and that also wouldn't be a breaking change, but, again, separable work.

@23Skidoo
Copy link
Member

Sure, implementing those additional features I mentioned is not required for this PR to be accepted.

@quasicomputational
Copy link
Contributor Author

Oops, this is buggy now. I know what I broke!

@quasicomputational
Copy link
Contributor Author

OK, I fixed the bug I introduced, and also another target of opportunity I spotted while I was staring at SrcDist.hs. If Travis and the reviewers are happy, I think this is good to go (but I thought that before, too).

@quasicomputational
Copy link
Contributor Author

Note that we definitely need the fix from #5289 (otherwise we will do the wrong thing with data-dir: **, which currently makes sdist fail). The included tests are the awkward part there.

@quasicomputational
Copy link
Contributor Author

Fingers crossed I've fixed the Windows problem. I squashed and rebased, and this ought to be ready for review (again) if Travis and AppVeyor like it.

These are inspired by a plan described in a comment in haskell#2522, and only
implement a quite limited form of recursive matching: only a single **
wildcard is accepted, it must be the final directory, and, if a **
wildcard is present, the file name must include a wildcard.

Or-patterns are not implemented, for simplicity.

Closes haskell#3178, haskell#2030.
@quasicomputational
Copy link
Contributor Author

I expanded the changelog note a bit, squashed and rebased again. CI ought to come back totally green since the persistent failures have been taken care of in master (hurrah!). If I don't get any objections, I'll merge this next week?


The reason for providing only a very limited form of wildcard is to
concisely express the common case of a large number of related files
of the same file type without making it too easy to accidentally
include unwanted files.

On efficiency: the directory tree will be walked starting with the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might even want to add a warning about this.

PatStem dir pat' ->
dir == seg && fileGlobMatchesSegments pat' segs
PatMatch Recursive ext ->
ext == takeExtensions (foldl' (flip const) seg segs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(foldl' (flip const) seg segs)

IMO last (seg : segs) is more readable. Or we can vendor lastDef from the safe package.

@23Skidoo 23Skidoo merged commit 5e83ef2 into haskell:master May 8, 2018
@23Skidoo
Copy link
Member

23Skidoo commented May 8, 2018

Merged, thanks!

quasicomputational added a commit to quasicomputational/cabal that referenced this pull request May 13, 2018
PR haskell#5284 changed things around, and now matchDirFileGlob will break if
it's passed a null directory, which happens to be the default value
for data-dir. Its call sites have been fixed to check for this and to
substitute '.' for an empty path, which is the desired behaviour; in
addition, matchDirFileGlob itself will now warn about this if it's
detected, so that new broken call sites can't sneak in.

Fixes haskell#5318.
quasicomputational added a commit to quasicomputational/cabal that referenced this pull request May 13, 2018
PR haskell#5284 changed things around, and now matchDirFileGlob will break if
it's passed a null directory, which happens to be the default value
for data-dir. Its call sites have been fixed to check for this and to
substitute '.' for an empty path, which is the desired behaviour; in
addition, matchDirFileGlob itself will now warn about this if it's
detected, so that new broken call sites can't sneak in.

Fixes haskell#5318.
quasicomputational added a commit to quasicomputational/cabal that referenced this pull request May 13, 2018
PR haskell#5284 changed things around, and now matchDirFileGlob will break if
it's passed a null directory, which happens to be the default value
for data-dir. Its call sites have been fixed to check for this and to
substitute '.' for an empty path, which is the desired behaviour; in
addition, matchDirFileGlob itself will now warn about this if it's
detected, so that new broken call sites can't sneak in.

Fixes haskell#5318.
quasicomputational added a commit to quasicomputational/cabal that referenced this pull request May 13, 2018
PR haskell#5284 changed things around, and now matchDirFileGlob will break if
it's passed a null directory, which happens to be the default value
for data-dir. Its call sites have been fixed to check for this and to
substitute '.' for an empty path, which is the desired behaviour; in
addition, matchDirFileGlob itself will now warn about this if it's
detected, so that new broken call sites can't sneak in.

Fixes haskell#5318.
phadej pushed a commit that referenced this pull request May 19, 2018
PR #5284 changed things around, and now matchDirFileGlob will break if
it's passed a null directory, which happens to be the default value
for data-dir. Its call sites have been fixed to check for this and to
substitute '.' for an empty path, which is the desired behaviour; in
addition, matchDirFileGlob itself will now warn about this if it's
detected, so that new broken call sites can't sneak in.

Fixes #5318.
quasicomputational added a commit to quasicomputational/cabal that referenced this pull request May 26, 2018
I fixed this in passing with haskell#5284, but let's add a test to make sure
it stays fixed. The error message isn't perfect, but it's a lot better
than failing silently!

Closes haskell#5195.
quasicomputational added a commit that referenced this pull request May 26, 2018
I fixed this in passing with #5284, but let's add a test to make sure
it stays fixed. The error message isn't perfect, but it's a lot better
than failing silently!

Closes #5195.
quasicomputational added a commit to quasicomputational/cabal that referenced this pull request Jun 27, 2018
This was only a convenience function, but its use could obscure how it
is introducing a dependency on the CWD. By removing it, the "."
argument to `matchDirFileGlob` is explicit.

Any external code using `matchFileGlob` would have needed to be
changed as haskell#5284 changed its signature and the module it lives in; it
is not much more of a burden to switch to `matchDirFileGlob` at the
same time.
@quasicomputational quasicomputational mentioned this pull request Jun 27, 2018
4 tasks
quasicomputational added a commit to quasicomputational/cabal that referenced this pull request Jun 27, 2018
This was only a convenience function, but its use could obscure how it
is introducing a dependency on the CWD. By removing it, the "."
argument to `matchDirFileGlob` is explicit.

Any external code using `matchFileGlob` would have needed to be
changed as haskell#5284 changed its signature and the module it lives in; it
is not much more of a burden to switch to `matchDirFileGlob` at the
same time.
quasicomputational added a commit to quasicomputational/cabal that referenced this pull request Jun 27, 2018
This was only a convenience function, but its use could obscure how it
is introducing a dependency on the CWD. By removing it, the "."
argument to `matchDirFileGlob` is explicit.

Any external code using `matchFileGlob` would have needed to be
changed as haskell#5284 changed its signature and the module it lives in; it
is not much more of a burden to switch to `matchDirFileGlob` at the
same time.
quasicomputational added a commit to quasicomputational/cabal that referenced this pull request Jun 28, 2018
This was only a convenience function, but its use could obscure how it
is introducing a dependency on the CWD. By removing it, the "."
argument to `matchDirFileGlob` is explicit.

Any external code using `matchFileGlob` would have needed to be
changed as haskell#5284 changed its signature and the module it lives in; it
is not much more of a burden to switch to `matchDirFileGlob` at the
same time.
23Skidoo pushed a commit that referenced this pull request Jun 28, 2018
This was only a convenience function, but its use could obscure how it
is introducing a dependency on the CWD. By removing it, the "."
argument to `matchDirFileGlob` is explicit.

Any external code using `matchFileGlob` would have needed to be
changed as #5284 changed its signature and the module it lives in; it
is not much more of a burden to switch to `matchDirFileGlob` at the
same time.
@jtojnar
Copy link

jtojnar commented Mar 14, 2019

What are good defaults for Verbosity and Version arguments if we want to replace the removed Distribution.Simple.Utils.matchFileGlob with Distribution.Simple.Glob.matchDirFileGlob?

Some packages depend on this:

@quasicomputational
Copy link
Contributor Author

The Version argument needs to come from the package description being processed, since the globbing behaviour is spec-version-dependent. The proto-lens diff you linked to is the right approach and hard-coding in a version as in haskell-hvr/hgettext#16 isn't. Looking at the code for that package, you'll have to do a little bit of plumbing through to get it to where it needs to be.

You can find ready-made Verbosity values in Distribution.Verbosity; normal is a reasonable default.

@andreasabel andreasabel added the re: globbing Concerning issues with globbing file patterns label Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cabal: file format re: globbing Concerning issues with globbing file patterns type: enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revive the globstar globbing patch
6 participants