Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gsub loops infinitely with "^" or "" #2148

Closed
garethhumphriesgkc opened this issue Jun 25, 2020 · 3 comments · Fixed by #2641
Closed

gsub loops infinitely with "^" or "" #2148

garethhumphriesgkc opened this issue Jun 25, 2020 · 3 comments · Fixed by #2641
Labels
Milestone

Comments

@garethhumphriesgkc
Copy link

Describe the bug
When you pass the pattern "^" or "" to gsub, it loops forever and eventually runs out of memory.

To Reproduce

$ echo '{"key": "value"}' | jq '.key|=gsub("^";"")'
error: cannot allocate memory
Aborted (core dumped)
$ echo '{"key": "value"}' | jq '.key|=gsub("";"")'
^C
$ echo '{"key": "value"}' | jq '.key|=gsub("^";"new-prefix")'
^C
$

Expected behavior
Once the match has been successful, it shouldn't be tried again for the same location.

Environment (please complete the following information):

$ cat /etc/issue ; jq -V ; echo ; dpkg -s jq | grep -e Version -e Package
Ubuntu 20.04 LTS \n \l

jq-1.6

Package: jq
Version: 1.6-1
$

Additional context
Add any other context about the problem here.

@pkoppstein
Copy link
Contributor

An important test case (in jq.test format):

gsub("";"!")
"abc"
"!a!b!c!"

weeble added a commit to weeble/jq that referenced this issue Mar 30, 2023
Problem: If a regex given to gsub can match a zero length string, it
will never terminate and eventually crash. This is because after
matching once it keeps checking for the same match at the same location
and finding it.

Solution: We can't skip ahead because there might be a non-zero length
match at the same location. Instead, keep track of whether the last
match was zero length, and when it was, ignore any zero length match at
the same offset.

Remaining problem: This fails one of the new tests I added. For some
reason, match doesn't seem to correctly return (some) zero length
matches at the end of the input string. I'm going to report that as a
separate bug, I can't see why it behaves like that.

Ref: jqlang#2148
@weeble
Copy link

weeble commented Mar 31, 2023

I tried to fix this with #2564, but it still doesn't pass the test case @pkoppstein describes due to #2565.

@Fraasi
Copy link

Fraasi commented Apr 7, 2023

I just ran into this with the pipe symbol, but after looping around for about 30 seconds, it just says killed...?

$ echo '{"key": "value | value"}' | jq '.key|=gsub("|";"-")'
Killed

$ jq --version
jq-1.6

@itchyny itchyny added the bug label Jun 3, 2023
pkoppstein added a commit to pkoppstein/jq that referenced this issue Jun 29, 2023
… uniq(stream)

The primary purpose of this commit (which supercedes PR
jqlang#2624) is to rectify most problems
with `gsub` (and also `sub` with the "g" option), in particular jqlang#1425
('\b'), jqlang#2354 (lookahead), and jqlang#2532 (regex == "^(?!cd ).*$|^cd ";"")).

This commit also partly resolves jqlang#2148 and jqlang#1206 in that `gsub` no
longer loops infinitely; however, because the new `gsub` depends
critically on match(_;"g"), the behavior when regex == "" is sometimes
non-standard. [*1]

Since the new sub/3 relies on uniq/1, that has been added as well [*2].

The documentation has been updated to reflect the fact that `sub` and
`gsub` are intended to be regular in the second argument. [*3]

Also, _nwise/1 has been tweaked to take advantage of TCO.

Footnotes:

[*1] Using the new gsub, '"a" | gsub( ""; "a")' emits "aa" rather than
"aaa" as would be standard.  This is nevertheless better than the
infinite loop behavior of jq 1.6 in this case.

With one exception (as explained in [*2]), the new gsub is implemented
as though match/2 behavior is correct.  That is, bugs in `gsub`
behavior will most likely have their origin in `match/2`.

[*2] `uniq/1` adopts the Unix/Linux name and semantics; it is needed for the following test case:

gsub("(?=u)"; "u")
"qux"
"quux"

Without this functionality:

Test jqlang#23: 'gsub("(?=u)"; "u")' at line number 100
*** Expected "quux", but got "quuux" for test at line number 102: gsub("(?=u)"; "u")

The root of the problem here is `match`: if `match` is fixed, then gsub would not need `untie`.

The addition of `uniq` as a top-level function should be a non-issue
relative to general concern about builtins.jq bloat: the line count of
the new builtin.jq is significantly reduced overall, and the number of
defs is actually reduced by 1 (from 111 (ignoring a redundant def) to 110).

[*3] See e.g. jqlang#513 (comment)
@itchyny itchyny added this to the 1.7 release milestone Jul 2, 2023
itchyny pushed a commit that referenced this issue Jul 3, 2023
The primary purpose of this commit is to rectify most problems with
`gsub` (and also `sub` with the `g` option), in particular fix #1425 ('\b'),
fix #2354 (lookahead), and fix #2532 (regex == `"^(?!cd ).*$|^cd "`).

This commit also partly resolves #2148 and resolves #1206 in that
`gsub` no longer loops infinitely; however, because the new `gsub`
depends critically on `match/2`, the behavior when regex == `""` is
sometimes non-standard.

The documentation has been updated to reflect the fact that `sub`
and `gsub` are intended to be regular in the second argument.

Also, `_nwise/1` has been tweaked to take advantage of TCO.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants