-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"contains" filter behaves improperly with NUL characters #1732
Comments
I can confirm that this is a bug that persists in the current (October 2, 2018) "master" version. |
This is interesting; I will look into this if I find the time (if not, I'll try to at least contribute some tests that show variations on this issue). |
This is meant to fix jqlang#1732. In the existing code, the libc function underlying `contains` on strings was `strstr`, which only works properly on C strings (i.e., arrays of characters, where the first null is an end marker). This is not suitable for JSON strings, which can embed null bytes; for example `"xx\u0000yy"` is considered to include `"\u0000aa"` as a substring, since the latter is interpreted as the empty string. This changeset uses `memset` instead of `strstr`.
In
And this is where escapes are handled in the lexer, which as you can see, leverages
By the time the jq program is compiled we have something like:
and the string constant is of length The The bug is that The fix is to not use
|
Thanks, @nicowilliams. This is consistent with my cursory analysis, but |
Woah, I was writing the text for my pull request while you were posting the explanation, @nicowilliams; talk about synchronicity :). |
Fixed with 61cd6db. |
@haguenau Oy, sorry... I should have let you fix it because it's always good to have more contributors! And this was both, easy, and tricky, which makes it a great bug for a contribution :( |
Heh. Things like that happen, that's OK. Perhaps consider using the tests from #1793? |
@haguenau Nice! Yeah, so, can you a) rebase so it's now just the tests, and b) try to reduce the number of tests by following the one I pushed as an example? (b) is not strictly necessary, naturally -- I suspect that the following pattern
is faster than
though I've never actually measured this sort of thing, so maybe that's wrong, and anyways, I find it easier to read. Since the Also, remove this test:
|
FYI, I've to pack and take a very early morning flight, so I might not communicate further or push your PR until tomorrow or Sunday :( |
Thanks for the detailed review, Nico; I will probably have time to submit test cases that match your expectations early this week. Even though you fixed the code itself at the same time I did, this is not lost; I learned a bit more about jq's internals. |
Description
The
contains(needle)
filter does not match an input that containsneedle
only after a NUL character. In JSON (and Unicode), NUL is anormal character, not an end-of-string marker.
To reproduce
jqplay link: https://jqplay.org/s/ufUZAtLeHn
Filter:
[contains("x"), contains("x\u0000"), contains("x\u0000y"), contains("y")]
JSON:
"x\u0000y"
Expected behavior
Output should be
[true, true, true, true]
.Actual behavior
Output is
[true, true, true, false]
.Environment
The text was updated successfully, but these errors were encountered: