-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC: Pratt parsing with shunting yard
algorithm
#618
base: main
Are you sure you want to change the base?
Conversation
// what we expecting to parse next | ||
let mut waiting_operand = true; | ||
// a stack for computing the result | ||
let mut value_stack = Vec::<Operand>::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to mark this as requiring std
. That is the one benefit to recursion that it can operate in no_std
environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user provided stack similar to Accumulate
Ah, curious idea to explore. I wouldn't put this as a blocker but we can create an issue and see if it garners interest
Pull Request Test Coverage Report for Build 11838052714Details
💛 - Coveralls |
let mut value_stack = Vec::<Operand>::new(); | ||
let mut operator_stack = Vec::<Operator<'_, Operand>>::new(); | ||
|
||
'parse: loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our use of a loop with waiting_operand
reminds me of rust-lang/rfcs#3720
}; | ||
} | ||
|
||
fn unwind_operators_stack<Operand>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think I'd name this something like unwind_operator_stack_to
to make it clear what the condition is for unwinding
_ => fail | ||
}, | ||
), | ||
trace("postfix", fail), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
precedence
could put these traces on the parameters it passes to shunting_yard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Granted, encouraging users to do it makes the parameter list easier to read
dispatch! {peek(any); | ||
'(' => delimited('(', trace("recursion", parser), cut_err(')')), | ||
_ => digit1.parse_to::<i32>() | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still requires recursion to do parenthesis but avoiding that is likely only something that can be handled with trivial expressions. This also puts the responsibility for recursion on the users side so they know its happening and can account for it as needed (e.g. having a depth check)
use super::*; | ||
|
||
fn parser(i: &mut &str) -> PResult<i32> { | ||
// TODO: how to elide the closure type without ugly `as _` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not clear to me what problem you are having here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user needs to manually convert &|_| {}
into &dyn Fn
where | ||
I: Stream + StreamIsPartial, | ||
ParseOperand: Parser<I, Operand, E>, | ||
ParseInfix: Parser<I, (usize, usize, &'i dyn Fn(Operand, Operand) -> Operand), E>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the infix should return the ugly (left_power, right_power, &dyn Fn(O) -> O) just 2 powers for now without a trick with Assoc enum.
So the two powers is more of a raw implementation and the associativity enums are an abstraction over it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The algorithm uses two powers for infix to determine what to parse next. From matklad's https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html
expr: A + B + C
power: 0 3 3.1 3 3.1 0
the enum in chumsky
automatically bumps the value with some clever trick
impl Associativity {
fn left_power(&self) -> u32 {
match self {
Self::Left(x) => *x as u32 * 2,
Self::Right(x) => *x as u32 * 2 + 1,
}
}
fn right_power(&self) -> u32 {
match self {
Self::Left(x) => *x as u32 * 2 + 1,
Self::Right(x) => *x as u32 * 2,
}
}
}
// if eval_stack.len() > 1 { | ||
// // Error: value left on stack | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error kinds. Algorithm has missing operand and value left on stack
I can see it being important to know of a "missing operand".
What end-user condition leaves a value on the stack or is that more of an assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.
I assume this API style of API, removing RefCell
and allowing dispatch!
could be applied to #614.
What is your overall impression of the two?
I'm also curious about the performance of recursion vs iteration but I suspect some differences, like use of dispatch!
would bias things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will write a benchmark with dispatches and stripped down ReffCells and tuples. We will see what is the best. The explicit stack is nice if the interface allows the user to customize the type such as VecDeque or SmallVec or something for no_std. Both functions are really similar except the recursion part
A really great description of this algorithm https://github.com/erikeidt/erikeidt.github.io/blob/master/The-Double-E-Method.md and a nice implementation in C# https://github.com/erikeidt/Draconum/blob/master/src/3.%20Expression%20Parser/Expression%20Parser%20Library/ExpressionParser.cs |
Attempt №2 #614
This is much smaller implementation based on the modified
shunting yard
from the https://github.com/bourguet/operator_precedence_parsing/tree/masterDifferences from the previous Pratt implementation:
Vec
stack is now used, with one stack for operands and another for operators.RefCell
sDifferences from the https://en.wikipedia.org/wiki/Shunting_yard_algorithm:
operand
using recursive sub-expression similar to Precedence parsing rust-bakery/nom#1362 (Wikipedia hardcodes braces into the algorithm itself)operator_precedence_parsing
repository introduces specialprefix_action
andpostfix_action
mutable closures for handling braces. This complicates the algorithm, so it’s out of scope for our PoC. However, we’ll keep it in mind if recursion for braces is undesirable.This is extremely barebones for now without the fancy UX we will agree later. it is 3 parsers slapped into the function signature: one each for prefix, postfix, and infix. Prefix and postfix parsers should return
(power, &dyn Fn(O) -> O)
and the infix should return the ugly(left_power, right_power, &dyn Fn(O) -> O)
just 2 powers for now without a trick withAssoc
enum.If this is the way to go I will apply your rewiew suggestions from #614 where they’re still applicable.
Minor things to consider
Accumulate
missing operand
andvalue left on stack