Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic MathML Errors using Parser Lookahead (some endpoints updated) #386

Merged
merged 4 commits into from
Aug 4, 2023

Conversation

Adi-UA
Copy link
Member

@Adi-UA Adi-UA commented Aug 4, 2023

Changes

  • Generic MathML Parser Error update:

    • Added tag level errors to generic_mathml.rs parser: <mi>, <mn>, <msup>, <msub>, <msqrt>, <mfrac>, <mrow>, <munder>, <mover>, <msubsup>, <mtext>, <mstyle>, <mspace>, <mo>.
    • /mathml/ast-graph endpoint now shows these errors.
  • First Order ODE Parser Error update:

    • Updated ParseError messages using the context combinator, removing the previous macro usage.
    • The generic MathML errors were excluded as this parser uses interpreted_mathml.rs, which doesn't encounter those errors at the math_expression level.
  • /pmml/equations-to-amr and /latex/equations-to-amr are passing on these errors. from skema-rs

Notes

  • Lookahead Algorithm:

    • Solved the problem of adding tag level parse errors by implementing a lookahead in the parser.
    • In math_expression, instead of using alt for multiple branches of parsers, the following steps were adopted:
      1. Grab the content of the next tag.
      2. If it is an open tag, call the appropriate parser. If the parser fails, we can immediately stop execution with cut because of the lookahead knowledge.
      3. If the tag was a close tag, return an Error instead of a Failure. Failure cuts the execution, but returning an Error allows the parent combinator to continue using parsers on the remaining input.
    • This approach enables many0 and other combinators to work as expected. When we run out of things (like math expressions) for many0 to match (encountered a close tag), we return an Error, allowing the parent combinator to continue. But, as long as we know there is an expression to match (open tag), we can guarantee that if the internal parser (for <mi>, <mo>, etc.) fails, it was due to bad input.

Testing

  • cargo test and cargo clippy passing.

…id ambiguity in when to cut

1. Lookahead by matching “<“, take until “>”. Let this be the current tag. ("mo", "/mo")
2. Grab the current tag in math_expression.
3. If the tag is an OPEN tag:
   - Run ` math_expression`` on the matched tag. Example: If the tag is an <mi>, run the  mi parser.
   - If there is an error in the element parser, cut with a Failure.
4. If the tag is a CLOSE tag:
   - Return a ParseError from math_expression
   - Put the current tag (the close tag) back into the input and run the tag end parser
5. When we run out of math expressions in a many , we will automatically grab a close tag and safely exit with a parse error instead of a failure
@Adi-UA Adi-UA changed the title Generic MathML Errors with Lookahead (some endpoints updated) Generic MathML Errors using Parser Lookahead (some endpoints updated) Aug 4, 2023
Copy link
Collaborator

@Free-Quarks Free-Quarks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@adarshp adarshp merged commit 1198e53 into main Aug 4, 2023
3 of 4 checks passed
@adarshp adarshp deleted the adi/mathml_parser_errors branch August 4, 2023 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants