Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing matches #181

Closed
pinkynrg opened this issue Aug 11, 2017 · 4 comments
Closed

missing matches #181

pinkynrg opened this issue Aug 11, 2017 · 4 comments

Comments

@pinkynrg
Copy link

pinkynrg commented Aug 11, 2017

I wrote a small test like this:

class TEST < Parslet::Parser
  rule(:first) { second >> third.maybe }
  rule (:second) { str("A") | str("AB") }
  rule (:third) { str('!') }
  root :first
end

shouldn't this example parse all the following inputs?

  • A
  • AB
  • A!
  • AB!

In my case it parses only:

  • A
  • A!

On the other hand if I replace in the "second" rule the "AB" string with a "C" string I can parse the following as expected:

  • A
  • C
  • A!
  • C!
@kschiess
Copy link
Owner

The '|' operator prefers left side over right. If the left side match is shorter and matches, the right side will never be tested (no backtracking).

In other words, your rule ':second' should probably be:

rule (:second) { str("AB") | str("A") }

@pinkynrg
Copy link
Author

Thanks!

@pinkynrg
Copy link
Author

pinkynrg commented Aug 11, 2017

My example is tricky, also because I've never built a parser before. Is there a way I can capture with a regex a substring of the original string and then check if the entire selected substring passes a certain rule? Is this something that makes sense when building a parser?

I'm debugging a parser for measurement units. This is the rule I'm looking at:

rule (:simpleton) do
  (prefix.as(:prefix) >> metric_atom.as(:atom) | atom.as(:atom))
end

prefixes are like m (milli), c (centi), k (kilo)...
metric_atoms are like m (meter), l (liter)...
atoms are like all items, metric and non metric m (meter), l (liter), mma (custom unit), ...

In the case I create a "non metric custom atom" like "mma" things brake! This is because with the above rule mma matches m+m = milli + meter but the final character "a" is left over and the parser fails cause I route it in the wrong branch.

If I were able to say: "look, mma is the thing you need to parse either parse it 100% with the simpleton rule or the measurement_unit doesn't exist".

I hope I was kind of clear.

@pinkynrg
Copy link
Author

I solved by adding a rule that checks using match["..."].present | any.absent?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants