Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PCRE with UTF-8 data on Windows #145

Open
cdornan opened this issue Jun 4, 2017 · 2 comments
Open

Fix PCRE with UTF-8 data on Windows #145

cdornan opened this issue Jun 4, 2017 · 2 comments
Assignees
Labels

Comments

@cdornan
Copy link
Contributor

cdornan commented Jun 4, 2017

@cdornan cdornan changed the title Fix PCRE with UTF-8 data Fix PCRE with UTF-8 data on Windows Jun 8, 2017
@cdornan
Copy link
Contributor Author

cdornan commented Jun 8, 2017

BTW, this issue was reported at regex-pcre-builtin.

@cdornan cdornan modified the milestone: v1.0.2.0 Jun 10, 2017
@cdornan cdornan self-assigned this Jun 10, 2017
@cdornan cdornan closed this as completed Dec 14, 2018
@cdornan cdornan added invalid stale issue abandoned and removed in progress invalid labels Dec 14, 2018
@cdornan cdornan reopened this Jan 16, 2019
@cdornan cdornan removed the stale issue abandoned label Jan 16, 2019
@goertzenator
Copy link

I ran into issues using PCRE.Text in the presence of unicode ligatures. Platform is Windows.

*Main Lib Text.RE.PCRE.Text> "a first hello to everyone"  *=~/ [ed|$(hello)///"$1"|]
"a first \"hello\" to everyone"  -- OK
*Main Lib Text.RE.PCRE.Text> "a first hello to everyone"  *=~/ [ed|$(hello)///"$1"|]
"a fir\64262 \"llo t\" to everyone"  -- Uh oh

ByteString sort of works, but it looks like it chews up my ligature:

*Main Lib Text.RE.PCRE.ByteString> "a first hello to everyone"  *=~/ [ed|$(hello)///"$1"|]
"a fir\ACK \"hello\" to everyone"

And String just crashes:

*Main Lib Text.RE.PCRE.String> "a first hello to everyone"  *=~/ [ed|$(hello)///"$1"|]
"*** Exception: utf8_correct_bs: UTF-8 decoding error
CallStack (from HasCallStack):
  error, called at .\Text\RE\ZeInternals\Types\Match.lhs:248:13 in regex-1.1.0.0-H1FPxX1khLGKIhuhwowTFL:Text.RE.ZeInternals.Types.Match

This does work correctly in the TDFA module, however my use case requires non-greedy matching which only appears to be supported by PCRE. My current work around is to use TDFA where I can and then manual non-regex search and replace where I require non-greedy behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants