Joni spins forever on invalid input #7

electrum · 2013-05-31T00:50:22Z

If you create a Matcher from a byte array containing invalid UTF-8, the match() method will spin forever due to invalid characters not being handled by ByteCodeMachine. For example, in the method opAnyCharStar():

    while (s < range) {
        ...
        int n = enc.length(bytes, s, end);
        if (s + n > range) {opFail(); return;}
        ...
    }

The enc.length() call returns -1 for malformed input, but this value isn't checked for, so the loop never exits. I haven't looked at this deeply enough to know the correct solution, but there are a ton of calls and none of them are checked.

The text was updated successfully, but these errors were encountered:

headius · 2013-06-01T01:45:02Z

I wouldn't say this is by design, but Joni does assume you're passing it valid character sequences. Adding character verification everywhere it is needed would obviously add overhead to the character-walking logic in Joni, slowing down all matches.

However, in the case you show, adding a -1 check would not be a significant source of overhead...so it may be worth adding.

headius · 2013-10-31T16:11:48Z

FWIW, recent releases of joni will check Thread interrupt status, so it may be possible to break out of a bad match like this.

I'm going to mark this won't fix until there's a general solution.

headius closed this as completed Oct 31, 2013

haozhun mentioned this issue Mar 18, 2015

Valid UTF-8 input can cause infinite loop in JONI #17

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joni spins forever on invalid input #7

Joni spins forever on invalid input #7

electrum commented May 31, 2013

headius commented Jun 1, 2013

headius commented Oct 31, 2013

Joni spins forever on invalid input #7

Joni spins forever on invalid input #7

Comments

electrum commented May 31, 2013

headius commented Jun 1, 2013

headius commented Oct 31, 2013