-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to disable sequence matching #212
Comments
I am hesitant to implement such behavior. I can understand that some users may have some niche needs, but considering this project aims for Bash-style globbing, the bare minimum requirements are I would also argue that these features don't need a lot of documentation and can be explained using at minimum what is listed below. I state this not to convince you to use these features, but just as an illustration. Certainly, these features could be delved into with greater depth and with examples (I know in my documentation that I can be a bit too wordy). I also understand that these stated features may still not be desired, regardless of how easy it is to explain or not explain the features. Supported Syntax
Supported character class names are found below.
|
I won't argue with you that you should adopt this feature if this is not in-scope for the project. I'd like the feature to be adopted, but that is neither here nor there. I understand keeping a strict leash on scope. Regarding the amount of documentation not being a lot: I don't necessarily disagree, but that's a lot more words than I want to be spending on describing how globs work in a specification that doesn't care for the feature. Because you're a field expert, though, I'd love your input—if wcmatch does not adopt this feature, how would you recommend I solve my problem? I have a few somewhat-satisfying solutions in mind:
|
Well, that was embarrassingly easy. I've made this library unnecessary with a few lines of code: def translate(path):
path = re.escape(path)
path = path.replace(r"\*\*/", r".*")
path = path.replace(r"\*\*", r".*")
path = path.replace(r"\*", r"[^/]*")
return f"^({path})$" This regex matches everything I want it to match, and nothing I don't want it to. It's a bit naïve, but it 100% answers my needs. Thank you for making wcmatch. This library was a great inspiration, and helped a lot during development. |
Since you are happy with your solution, I will close this issue. |
Turned out to be a touch more complicated with escaping. Resulting code: def translate(path: str) -> str:
blocks = []
escaping = False
globstar = False
prev_char = ""
for char in path:
if char == "\\":
if prev_char == "\\" and escaping:
escaping = False
blocks.append(r"\\")
else:
escaping = True
elif char == "*":
if escaping:
blocks.append(re.escape("*"))
escaping = False
elif prev_char == "*" and not globstar:
globstar = True
blocks.append(r".*")
elif char == "/":
if globstar:
pass
else:
globstar = False
blocks.append("/")
escaping = False
else:
if prev_char == "*" and not globstar:
blocks.append(r"[^/]*")
blocks.append(re.escape(char))
globstar = False
escaping = False
prev_char = char
result = "".join(blocks)
return f"^({result})$" With tests and everything it works. |
There is quite a bit of nuance in globbing, both in pattern recognition and in generating the regular expression that always behaves as intended, especially cross-platform. If your needs are simple, it is likely your solution can be simple. Most likely any issues will only arise when stressed with relevant use cases. I have no idea of the extent of your use case so I cannot comment further. |
I'm writing a specification that needs a very, very, very simple globbing of paths, and I am writing a tool that implements the specification. The globbing needs to:
*
, including.
at the start of a file, excluding path separators.**
, including.
at the start of a file, including path separators.\
.Using
DOTGLOB | FORCEUNIX | GLOBSTAR
in wcmatch, I am like 99% of the way there, which is great! Less work for me, and this library looks amazing.However, wcmatch comes with two more features that I do not need:
?
matches a single character.[seq]
and[!seq]
matches any character in the sequence.I am not well bothered by
?
—I do not need it, but I can add it to the specification without much fuss. Being able to turn this off would be nice, but it is not needed.[seq]
is far too feature-rich for me to use, however. I do not want to include this in the specification. But if I do not include this in the specification, then my tool must subsequently not implement this. I have considered using a pre-parser to escape all square brackets, but I figured an upstream-first approach would be helpful.I would love a
NOSEQ
flag that disables the behaviour I do not need.Additional context:
This is being discussed in this thread in a PR. Copying my available options from that thread:
The relevant bit of the specification is fairly simple. There is a
REUSE.toml
configuration file that matches path expressions against existing files in a project, and applies metadata to those matched files. For example:This matches all Markdown files and asserts that they are all licensed under CC-BY-4.0. Super simple, easy peasy, no fancy features needed.
The text was updated successfully, but these errors were encountered: