-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should work with bytes? #65
Comments
Hmm just noticed |
Constantly segfaults when I disable AHOCORASICK_UNICODE, even when not using pyahocorasick objects - py3.5.3. |
Bytes are supported. Could you please run python unitests.py? There're many tests that use bytes. |
This is with pristine pyahocorasick on python 3.5.3:
|
Hello, |
Unfortunately I cannot devote as much time as previously. Waiting for contributors. |
could we at least flip the default and make a 1.5 bump ? it's easy to convert strings->bytes, but hard to convert bytes->strings reliably. |
Not exactly the best solution, but you can still use a codec which cannot fail to do the bytes to string conversation like |
@vapier hey 👋 re:
Can you be more explicit? what would need to be fixed/changed and where? |
@vapier you rock! .. I guess there is an interesting derived tidbit: pyahocorasick may be used in chromium OS? 😇 And I see you are also the author to credit for the patch https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+log/refs/heads/release-R100-14526.B/dev-python/pyahocorasick/files/pyahocorasick-1.4.0-bytes.patch I also now recall of so many of your awesome contribs like on strace and so many other places. I am not worthy. |
@WojciechMula this deserves a 1.5 bump indeed. I reckon I am not impacted in most cases as usually stuff only integers in automatons. But I wonder if we should not publish two builds now that I started pushing pre-built wheels for all OSes: one for bytes and one for unicode? |
Oh man, it's really great news about usage! Anyway, it would be great if we provided specialisations for each type and then have some lightweight factory for this, even in pure python. This would be pretty easy if we have some code generator underneath. |
Tests are now mostly passing when building without unicode. The tests should also run in the CI though. Reference: #65 Signed-off-by: Philippe Ombredanne <[email protected]>
The behaviour with bytes build has some issues. This helps testing this Reference: #65 Reference: #165 Signed-off-by: Philippe Ombredanne <[email protected]>
The environment variables AHOCORASICK_UNICODE and AHOCORASICK_BYTES now drive the flavor of the build if defined (using any value). Reference: #65 Reference: #165 Signed-off-by: Philippe Ombredanne <[email protected]>
Use environment variables AHOCORASICK_UNICODE and AHOCORASICK_BYTES to test vboth builds on all supported OSes. Reference: #65 Reference: #165 Signed-off-by: Philippe Ombredanne <[email protected]>
I pushed some tests in this branch https://github.com/WojciechMula/pyahocorasick/tree/release-2.0.0
Define either There are some issues that show up:
@vapier the new environment variable should help you at the minimum get rid of your setup.py patch Now to support strings & bytes simultaneously, we could either build both and manage a factory in Python as @WojciechMula suggested above... or find a way to deal with both together. I have been toying with the idea of an alternative unified way (on the C side) where the number of symbols in some "alphabet" would be the key input as opposed to the implied bytes = length 1 (e.g. 256 symbols in alphabet) and unicode = variable length based on build. Also re: #65 (comment)
Actually I usually use sequences of integers as keys with ahocorasick.KEY_SEQUENCE |
You can see the runs failures there https://github.com/WojciechMula/pyahocorasick/actions/runs/1952208882
|
At this stage all the bytes-based tests pass. |
We have #165 as a related issue |
personally i'd rip the bandaid off, but failing that, publishing two wheels with the diff names sounds OK assuming they can be installed in parallel i guess there's no way to build 2 C components into a single package and pick the right one based on the inputs ? |
No description provided.
The text was updated successfully, but these errors were encountered: