PoC: Pratt parsing with `shunting yard` algorithm #618

39555 · 2024-11-14T13:20:39Z

Attempt №2 #614

This is much smaller implementation based on the modified shunting yard from the https://github.com/bourguet/operator_precedence_parsing/tree/master

Differences from the previous Pratt implementation:

No more recursion. The explicit Vec stack is now used, with one stack for operands and another for operators.
Without RefCells

Differences from the https://en.wikipedia.org/wiki/Shunting_yard_algorithm:

Parsing is done in a single pass without first converting the expression to Polish notation.
Braces '(' are handled as an operand using recursive sub-expression similar to Precedence parsing rust-bakery/nom#1362 (Wikipedia hardcodes braces into the algorithm itself)
- The operator_precedence_parsing repository introduces special prefix_action and postfix_action mutable closures for handling braces. This complicates the algorithm, so it’s out of scope for our PoC. However, we’ll keep it in mind if recursion for braces is undesirable.

This is extremely barebones for now without the fancy UX we will agree later. it is 3 parsers slapped into the function signature: one each for prefix, postfix, and infix. Prefix and postfix parsers should return
(power, &dyn Fn(O) -> O) and the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.

If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.

Minor things to consider

A user provided stack similar to Accumulate
Error kinds. Algorithm has missing operand and value left on stack

epage · 2024-11-14T14:54:24Z

src/combinator/shunting_yard.rs

+    // what we expecting to parse next
+    let mut waiting_operand = true;
+    // a stack for computing the result
+    let mut value_stack = Vec::<Operand>::new();


We'll need to mark this as requiring std. That is the one benefit to recursion that it can operate in no_std environments.

A user provided stack similar to Accumulate

Ah, curious idea to explore. I wouldn't put this as a blocker but we can create an issue and see if it garners interest

coveralls · 2024-11-14T14:55:30Z

Pull Request Test Coverage Report for Build 11838052714

Details

0 of 54 (0.0%) changed or added relevant lines in 1 file are covered.
1 unchanged line in 1 file lost coverage.
Overall coverage decreased (-0.7%) to 40.843%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/combinator/shunting_yard.rs	0	54	0.0%

Files with Coverage Reduction	New Missed Lines	%
src/stream/mod.rs	1	24.95%

Totals
Change from base Build 11802515306:	-0.7%
Covered Lines:	1298
Relevant Lines:	3178

💛 - Coveralls

epage · 2024-11-14T14:57:35Z

src/combinator/shunting_yard.rs

+    let mut value_stack = Vec::<Operand>::new();
+    let mut operator_stack = Vec::<Operator<'_, Operand>>::new();
+
+    'parse: loop {


Our use of a loop with waiting_operand reminds me of rust-lang/rfcs#3720

epage · 2024-11-14T15:00:00Z

src/combinator/shunting_yard.rs

+    };
+}
+
+fn unwind_operators_stack<Operand>(


nit: I think I'd name this something like unwind_operator_stack_to to make it clear what the condition is for unwinding

epage · 2024-11-14T15:01:12Z

src/combinator/shunting_yard.rs

+                    _ => fail
+                },
+            ),
+            trace("postfix", fail),


precedence could put these traces on the parameters it passes to shunting_yard

Granted, encouraging users to do it makes the parameter list easier to read

epage · 2024-11-14T15:03:25Z

src/combinator/shunting_yard.rs

+                dispatch! {peek(any);
+                    '(' => delimited('(', trace("recursion",  parser), cut_err(')')),
+                    _ => digit1.parse_to::<i32>()
+                },


Still requires recursion to do parenthesis but avoiding that is likely only something that can be handled with trivial expressions. This also puts the responsibility for recursion on the users side so they know its happening and can account for it as needed (e.g. having a depth check)

epage · 2024-11-14T15:03:52Z

src/combinator/shunting_yard.rs

+    use super::*;
+
+    fn parser(i: &mut &str) -> PResult<i32> {
+        // TODO: how to elide the closure type without ugly `as _`


Its not clear to me what problem you are having here

The user needs to manually convert &|_| {} into &dyn Fn

epage · 2024-11-14T15:11:54Z

src/combinator/shunting_yard.rs

+where
+    I: Stream + StreamIsPartial,
+    ParseOperand: Parser<I, Operand, E>,
+    ParseInfix: Parser<I, (usize, usize, &'i dyn Fn(Operand, Operand) -> Operand), E>,


the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.

So the two powers is more of a raw implementation and the associativity enums are an abstraction over it?

Yes. The algorithm uses two powers for infix to determine what to parse next. From matklad's https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html

expr: A + B + C power: 0 3 3.1 3 3.1 0

the enum in chumsky automatically bumps the value with some clever trick

impl Associativity { fn left_power(&self) -> u32 { match self { Self::Left(x) => *x as u32 * 2, Self::Right(x) => *x as u32 * 2 + 1, } } fn right_power(&self) -> u32 { match self { Self::Left(x) => *x as u32 * 2 + 1, Self::Right(x) => *x as u32 * 2, } } }

epage · 2024-11-14T15:14:15Z

src/combinator/shunting_yard.rs

+    // if eval_stack.len() > 1 {
+    //     // Error: value left on stack
+    // }


Error kinds. Algorithm has missing operand and value left on stack

I can see it being important to know of a "missing operand".

What end-user condition leaves a value on the stack or is that more of an assert?

epage · 2024-11-14T15:16:58Z

src/combinator/shunting_yard.rs

If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.

I assume this API style of API, removing RefCell and allowing dispatch! could be applied to #614.

What is your overall impression of the two?

I'm also curious about the performance of recursion vs iteration but I suspect some differences, like use of dispatch! would bias things

I will write a benchmark with dispatches and stripped down ReffCells and tuples. We will see what is the best. The explicit stack is nice if the interface allows the user to customize the type such as VecDeque or SmallVec or something for no_std. Both functions are really similar except the recursion part

39555 · 2024-11-15T17:58:27Z

A really great description of this algorithm https://github.com/erikeidt/erikeidt.github.io/blob/master/The-Double-E-Method.md and a nice implementation in C# https://github.com/erikeidt/Draconum/blob/master/src/3.%20Expression%20Parser/Expression%20Parser%20Library/ExpressionParser.cs

PoC: Pratt parsing with shunting yard algorithm

6a488c2

epage reviewed Nov 14, 2024

View reviewed changes

epage mentioned this pull request Nov 15, 2024

perf: bench for pratt and shunting yard parsers #620

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC: Pratt parsing with `shunting yard` algorithm #618

PoC: Pratt parsing with `shunting yard` algorithm #618

39555 commented Nov 14, 2024 •

edited

Loading

epage Nov 14, 2024

epage Nov 14, 2024

coveralls commented Nov 14, 2024

epage Nov 14, 2024

epage Nov 14, 2024

epage Nov 14, 2024

epage Nov 14, 2024

epage Nov 14, 2024

epage Nov 14, 2024

39555 Nov 14, 2024

epage Nov 14, 2024

39555 Nov 14, 2024 •

edited

Loading

epage Nov 14, 2024

epage Nov 14, 2024

39555 Nov 14, 2024

39555 commented Nov 15, 2024 •

edited

Loading

PoC: Pratt parsing with shunting yard algorithm #618

Are you sure you want to change the base?

PoC: Pratt parsing with shunting yard algorithm #618

Conversation

39555 commented Nov 14, 2024 • edited Loading

Minor things to consider

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Nov 14, 2024

Pull Request Test Coverage Report for Build 11838052714

Details

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

39555 Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

39555 commented Nov 15, 2024 • edited Loading

PoC: Pratt parsing with `shunting yard` algorithm #618

PoC: Pratt parsing with `shunting yard` algorithm #618

39555 commented Nov 14, 2024 •

edited

Loading

39555 Nov 14, 2024 •

edited

Loading

39555 commented Nov 15, 2024 •

edited

Loading