Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PoC: Pratt parsing with shunting yard algorithm #618

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

39555
Copy link

@39555 39555 commented Nov 14, 2024

Attempt №2 #614

This is much smaller implementation based on the modified shunting yard from the https://github.com/bourguet/operator_precedence_parsing/tree/master

Differences from the previous Pratt implementation:

  • No more recursion. The explicit Vec stack is now used, with one stack for operands and another for operators.
  • Without RefCells

Differences from the https://en.wikipedia.org/wiki/Shunting_yard_algorithm:

  • Parsing is done in a single pass without first converting the expression to Polish notation.
  • Braces '(' are handled as an operand using recursive sub-expression similar to Precedence parsing rust-bakery/nom#1362 (Wikipedia hardcodes braces into the algorithm itself)
    • The operator_precedence_parsing repository introduces special prefix_action and postfix_action mutable closures for handling braces. This complicates the algorithm, so it’s out of scope for our PoC. However, we’ll keep it in mind if recursion for braces is undesirable.

This is extremely barebones for now without the fancy UX we will agree later. it is 3 parsers slapped into the function signature: one each for prefix, postfix, and infix. Prefix and postfix parsers should return
(power, &dyn Fn(O) -> O) and the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.

If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.

Minor things to consider

  • A user provided stack similar to Accumulate
  • Error kinds. Algorithm has missing operand and value left on stack

// what we expecting to parse next
let mut waiting_operand = true;
// a stack for computing the result
let mut value_stack = Vec::<Operand>::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to mark this as requiring std. That is the one benefit to recursion that it can operate in no_std environments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A user provided stack similar to Accumulate

Ah, curious idea to explore. I wouldn't put this as a blocker but we can create an issue and see if it garners interest

@coveralls
Copy link

Pull Request Test Coverage Report for Build 11838052714

Details

  • 0 of 54 (0.0%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-0.7%) to 40.843%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/combinator/shunting_yard.rs 0 54 0.0%
Files with Coverage Reduction New Missed Lines %
src/stream/mod.rs 1 24.95%
Totals Coverage Status
Change from base Build 11802515306: -0.7%
Covered Lines: 1298
Relevant Lines: 3178

💛 - Coveralls

let mut value_stack = Vec::<Operand>::new();
let mut operator_stack = Vec::<Operator<'_, Operand>>::new();

'parse: loop {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our use of a loop with waiting_operand reminds me of rust-lang/rfcs#3720

};
}

fn unwind_operators_stack<Operand>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think I'd name this something like unwind_operator_stack_to to make it clear what the condition is for unwinding

_ => fail
},
),
trace("postfix", fail),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

precedence could put these traces on the parameters it passes to shunting_yard

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Granted, encouraging users to do it makes the parameter list easier to read

Comment on lines +172 to +175
dispatch! {peek(any);
'(' => delimited('(', trace("recursion", parser), cut_err(')')),
_ => digit1.parse_to::<i32>()
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still requires recursion to do parenthesis but avoiding that is likely only something that can be handled with trivial expressions. This also puts the responsibility for recursion on the users side so they know its happening and can account for it as needed (e.g. having a depth check)

use super::*;

fn parser(i: &mut &str) -> PResult<i32> {
// TODO: how to elide the closure type without ugly `as _`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not clear to me what problem you are having here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user needs to manually convert &|_| {} into &dyn Fn

where
I: Stream + StreamIsPartial,
ParseOperand: Parser<I, Operand, E>,
ParseInfix: Parser<I, (usize, usize, &'i dyn Fn(Operand, Operand) -> Operand), E>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.

So the two powers is more of a raw implementation and the associativity enums are an abstraction over it?

Copy link
Author

@39555 39555 Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The algorithm uses two powers for infix to determine what to parse next. From matklad's https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html

expr:      A       +       B       +       C
power:  0      3      3.1      3      3.1     0

the enum in chumsky automatically bumps the value with some clever trick

impl Associativity {
    fn left_power(&self) -> u32 {
        match self {
            Self::Left(x) => *x as u32 * 2,
            Self::Right(x) => *x as u32 * 2 + 1,
        }
    }

    fn right_power(&self) -> u32 {
        match self {
            Self::Left(x) => *x as u32 * 2 + 1,
            Self::Right(x) => *x as u32 * 2,
        }
    }
}

Comment on lines +104 to +106
// if eval_stack.len() > 1 {
// // Error: value left on stack
// }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error kinds. Algorithm has missing operand and value left on stack

I can see it being important to know of a "missing operand".

What end-user condition leaves a value on the stack or is that more of an assert?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.

I assume this API style of API, removing RefCell and allowing dispatch! could be applied to #614.

What is your overall impression of the two?

I'm also curious about the performance of recursion vs iteration but I suspect some differences, like use of dispatch! would bias things

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will write a benchmark with dispatches and stripped down ReffCells and tuples. We will see what is the best. The explicit stack is nice if the interface allows the user to customize the type such as VecDeque or SmallVec or something for no_std. Both functions are really similar except the recursion part

@39555
Copy link
Author

39555 commented Nov 15, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants