Use regex instead of re? #162

julienmalard · 2020-05-14T14:28:54Z

Hello,

While using parsimonious I came across an already-documented bug in the builtin re module, whereby abugidas with vowel marks (that is, most of South Asia and Oceania) fail to match the \w directive (e.g., as in re.match("^[\w\s][\w\s]*", "किशोरी", re.UNICODE), as taken from this question here). In my particular case, this bug crashes my system dynamics models when I run them in PySD.

It seems that the recommended "solution" is to simply used the regex instead of re. However, this would mean adding a dependency to parsimonious, so I thought I would ask whether you would look favourably upon such a suggestion before submitting a pull request. I was thinking of proposing something along the lines of:

try:
    import regex as re
except ImportError:
    import re

so that nothing breaks if the dependency is not installed.

Please do let me know. Thanks!
-Julien Malard

The text was updated successfully, but these errors were encountered:

buhl · 2020-05-14T23:20:31Z

Hi Julien,
I was where not so long ago and made a PR then shortly after discovering it had already been tried and rejected.

#161

julienmalard · 2020-05-15T14:11:45Z

Hello @buhl ,
Thanks for letting me know...@erikrose, any chance you might consider this? I see from the other pull request, though, that you prefer to discourage the use of regexes in grammars in the future. Would you happen to have a recommendation as to how the use case suggested above (\w and p{Mn}) could be implemented according to best practices?

I had considered manually generating a list of acceptable characters (see here), but was not sure whether that would be considered best practice or not.

Thanks!

jenstroeger · 2021-03-10T06:18:42Z

This might not be the right place, but @erikrose could you please document the meaning of ilmsuxa for regexes? Are those the single-letter flags documented by the regex module?

this resolves jenstroeger's comment in issue #162

lonnen · 2022-03-28T11:17:54Z

@jenstroeger when in doubt, file a new issue. They're cheap. This time there's no need, however, since I addressed this with 6e52df0

lonnen · 2022-03-28T11:29:44Z

Some more background for the original issue:

According to https://bugs.python.org/issue12731, it looked like regex was going to replace re in the standard lib for 3.3, then maybe it was going to be added as an alternative, but then the issue fell of the radar for 9 or 10 years. It seems like this may still happen in the future, and regex is still actively developed and generally less uncompliant with unicode.

I understand @erikrose has (had?) other ambitions for Parsimonious, but unless a new contributor appears I don't expect to realize the dream of removing regular expressions anytime soon. Given this, pragmatism pushed me in favor of replacing re with regex until a suitable champion for the alternative appears.

lonnen · 2022-03-29T08:33:36Z

I sat on it another day and decided to ahead with it. @Oderjunkie helpfully had a PR already up so I rebased that, updated the version, and squashed it. This should be resolved now by c0ffc3e

I still think it is valuable, long term, to achieve the removal of regexs as Erik describes here: #149 (comment). This change is a pragmatic compromise to improve Parsimonious today while there are no alternatives to using regular expressions for some common use cases

thanks everyone!

julienmalard · 2022-03-29T08:35:14Z

Many thanks!

boringcactus mentioned this issue Oct 14, 2020

ParseError thrown from TokenGrammar can't be printed #171

Closed

lonnen added a commit that referenced this issue Mar 28, 2022

improve documentation of regex flags

6e52df0

this resolves jenstroeger's comment in issue #162

lonnen closed this as completed Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use regex instead of re? #162

Use regex instead of re? #162

julienmalard commented May 14, 2020 •

edited

Loading

buhl commented May 14, 2020

julienmalard commented May 15, 2020

jenstroeger commented Mar 10, 2021

lonnen commented Mar 28, 2022

lonnen commented Mar 28, 2022

lonnen commented Mar 29, 2022 •

edited

Loading

julienmalard commented Mar 29, 2022

Use regex instead of re? #162

Use regex instead of re? #162

Comments

julienmalard commented May 14, 2020 • edited Loading

buhl commented May 14, 2020

julienmalard commented May 15, 2020

jenstroeger commented Mar 10, 2021

lonnen commented Mar 28, 2022

lonnen commented Mar 28, 2022

lonnen commented Mar 29, 2022 • edited Loading

julienmalard commented Mar 29, 2022

julienmalard commented May 14, 2020 •

edited

Loading

lonnen commented Mar 29, 2022 •

edited

Loading