Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use regex instead of re? #162

Closed
julienmalard opened this issue May 14, 2020 · 7 comments
Closed

Use regex instead of re? #162

julienmalard opened this issue May 14, 2020 · 7 comments

Comments

@julienmalard
Copy link

julienmalard commented May 14, 2020

Hello,

While using parsimonious I came across an already-documented bug in the builtin re module, whereby abugidas with vowel marks (that is, most of South Asia and Oceania) fail to match the \w directive (e.g., as in re.match("^[\w\s][\w\s]*", "किशोरी", re.UNICODE), as taken from this question here). In my particular case, this bug crashes my system dynamics models when I run them in PySD.

It seems that the recommended "solution" is to simply used the regex instead of re. However, this would mean adding a dependency to parsimonious, so I thought I would ask whether you would look favourably upon such a suggestion before submitting a pull request. I was thinking of proposing something along the lines of:

try:
    import regex as re
except ImportError:
    import re

so that nothing breaks if the dependency is not installed.

Please do let me know. Thanks!
-Julien Malard

@buhl
Copy link

buhl commented May 14, 2020

Hi Julien,
I was where not so long ago and made a PR then shortly after discovering it had already been tried and rejected.

#161

@julienmalard
Copy link
Author

Hello @buhl ,
Thanks for letting me know...@erikrose, any chance you might consider this? I see from the other pull request, though, that you prefer to discourage the use of regexes in grammars in the future. Would you happen to have a recommendation as to how the use case suggested above (\w and p{Mn}) could be implemented according to best practices?

I had considered manually generating a list of acceptable characters (see here), but was not sure whether that would be considered best practice or not.

Thanks!

@jenstroeger
Copy link

This might not be the right place, but @erikrose could you please document the meaning of ilmsuxa for regexes? Are those the single-letter flags documented by the regex module?

lonnen added a commit that referenced this issue Mar 28, 2022
this resolves jenstroeger's comment in issue #162
@lonnen
Copy link
Collaborator

lonnen commented Mar 28, 2022

@jenstroeger when in doubt, file a new issue. They're cheap. This time there's no need, however, since I addressed this with 6e52df0

@lonnen
Copy link
Collaborator

lonnen commented Mar 28, 2022

Some more background for the original issue:

According to https://bugs.python.org/issue12731, it looked like regex was going to replace re in the standard lib for 3.3, then maybe it was going to be added as an alternative, but then the issue fell of the radar for 9 or 10 years. It seems like this may still happen in the future, and regex is still actively developed and generally less uncompliant with unicode.

I understand @erikrose has (had?) other ambitions for Parsimonious, but unless a new contributor appears I don't expect to realize the dream of removing regular expressions anytime soon. Given this, pragmatism pushed me in favor of replacing re with regex until a suitable champion for the alternative appears.

@lonnen
Copy link
Collaborator

lonnen commented Mar 29, 2022

I sat on it another day and decided to ahead with it. @Oderjunkie helpfully had a PR already up so I rebased that, updated the version, and squashed it. This should be resolved now by c0ffc3e

I still think it is valuable, long term, to achieve the removal of regexs as Erik describes here: #149 (comment). This change is a pragmatic compromise to improve Parsimonious today while there are no alternatives to using regular expressions for some common use cases

thanks everyone!

@lonnen lonnen closed this as completed Mar 29, 2022
@julienmalard
Copy link
Author

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants