-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match regex metacharacters #297
Comments
Try using
In the FAQ http://johnkerl.org/miller/doc/faq.html#How_to_escape_'?'_in_regexes? |
@aborruso Thanks for the alternative syntax. Here's my journey:
|
Hi @torbiak -- I'll dig. "What regex variant" -- it's just |
I now see there's a comment in |
Is it correct that regex metacharacters, such as the "word boundary" anchor |
Hi @hftf in the documentation you have: Miller lets you use regular expressions of type POSIX.2. I think |
@aborruso Learned something new, thanks! I wasn’t familiar with POSIX regular expression behavior specifically, and I had just assumed that widespread metacharacters like (Edit: There might be a workaround, explained here: https://stackoverflow.com/questions/9792702/does-bash-support-word-boundary-regular-expressions) |
Hi @torbiak -- after long delay I'm looking at this for the Go port. Currently:
I'm not positive yet but I think the solution might be having that first conversion ( This means a context-specific "leave alone" for regex strings so there will be nothing for the Go equivalent of (Note that in Python -- as you pointed out -- there is the |
This will be in the Go port -- next PR is in prep. THANK YOU for your thorough research on this! 🙏 |
Test case 55209bf Comments 9286f5a More doc details about Miller regexes upcoming at https://johnkerl.org/miller6 (subsequent PRs) |
With the NodeTypeRegex relabelling scheme described in the Comments 9286f5a (leaves.go) link above, would
|
@torbiak no ... the 'implicit r' would only apply for string literals in that position. To make this work with the |
@torbiak another option would be to abandon the notion of 'implicit r' entirely, and only have 'explicit r' ... i'm not sure which approach follows the principle of least surprise ... Option 1 -- implicit and explicit r:
Option 2 -- explicit r only:
|
I prefer option 1 personally ... |
Python's had a big influence on me, so I'm inclined towards explicit-only, but I'm having trouble writing a good argument against having implicit r-strings, too. I can't think of a case where you'd want the usual string literal behaviour for a regex. Having both feels consistent with the kind of tool that miller is and what I've read about your goals for it. My biggest worry about implicit r-strings is that for someone who hasn't read about r-strings in the the docs, it could be confusing that variable and literal arguments to the regex functions behave differently, and that this will make it harder for them to build an accurate mental model of how string literals and regexes work in miller. |
Good news is there's little backward behavior to protect so explicit-r would break almost nobody ... ... bad news is the one thing that does work is ... other than that I am with you on the above. |
This issue is resolved in Miller 6 except for the new feature-add of r-strings as discussed above -- probably a good candidate for 6.1. |
Very similar to #159, I can't match a square bracket, asterisk, etc, regardless how many backslashes I put because
mlr_alloc_double_backslash
is doubling them. Adding exceptions for square brackets, like the one already there for period, allows me to match them as expected, but it seems we'd either need an exception for all the ERE metacharacters or take a different approach to escaping backslashes.The text was updated successfully, but these errors were encountered: