code inspection: parser kernel doesn't hand over correctly from error recovery phase #25

GerHobbelt · 2017-10-29T21:50:19Z

Following up on #21, while doing a bit of code inspection on my own work through copy-edit/git-compare/diff via Beyond Compare, I noticed that the parser kernel has a few very subtle bugs in the error recovery parse loop: as the parse loop is duplicated in the error recovery section (so that I can optimize the main parse loop for regular operations and not worry about the special handling that is required for error recovery there), it does not hand over / drop out into the outer parse loop correctly:

when parseError() produces a 'parser return value', that one is DESTROYED in the ACCEPT phase of the outer loop.
the outer loop can be further optimized when it doesn't have to worry about a still-active 'recovery phase', i.e. recovering === 0 should be a precondition in the outer parse loop.
when (edge case) the lexer also is in the habit of producing TERROR tokens (some of my grammars do this), then we will loose their yyval and yylloc! Hence we must differentiate between a TERROR as a replacement token set up in the parser kernel error recovery section and a TERROR token produced by the lexer: the latter is an error token too, but should only indirectly trigger error recovery by the parser.

Hint To Self: this means that an error term in a grammar production has an associated value which is either a parser error recovery info object or a lexer-produced yyvalue, depending on whether the lexer TERROR-or-other token triggered parser error recovery or not! ... Talk about complex internals... 🤡

The text was updated successfully, but these errors were encountered:

…n cross-compared and adjusted to suit our original intent as described in the issue #25: see the comments (edited) for the important parts of this work.

…erved that `retval` isn't always teated properly when its *potential value* is produced by `parseError()`: only when `parseError()` produces a sensible value (i.e. *not* `undefined`!) should that value be produced by the parser. Otherwise a parse *error* should produce the value `false` to signal parse/match failure on the given input.

… the preceeding commits: `action === 0` is the error parse state and that one, when it is discovered during error **recovery** in the inner slow parse loop, is handed back to the outer loop to prevent undue code duplication. Handing back means the outer loop will have to process that state, not exit on it immediately!

…reset/cleanup the `recoveringErrorInfo` object as one may invoke `yyerrok` while still inside the error recovery phase of the parser, thus *potentially* causing trouble down the lane for subsequent parse states. (This is another edge case that's hard to produce: better-safe-than-sorry coding style applies.)

GerHobbelt · 2017-10-30T00:36:13Z

Finished work on this issue.

GerHobbelt added bug question labels Oct 29, 2017

GerHobbelt closed this as completed Oct 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code inspection: parser kernel doesn't hand over correctly from error recovery phase #25

code inspection: parser kernel doesn't hand over correctly from error recovery phase #25

GerHobbelt commented Oct 29, 2017

GerHobbelt commented Oct 30, 2017

code inspection: parser kernel doesn't hand over correctly from error recovery phase #25

code inspection: parser kernel doesn't hand over correctly from error recovery phase #25

Comments

GerHobbelt commented Oct 29, 2017

GerHobbelt commented Oct 30, 2017