-
-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Building a stable and extensible parser API (and hiding the rougher implementation details behind sealed traits) #354
Comments
One thing that I would find interesting would be a combinator that takes multiple parsers, and "merges" them. This would be convenient for writing grammars in really simple ways and not having to manually merge them correctly. The boring example would be
While the way more interesting example is parsing decimal and hexadecimal numbers.
The or combinator is similar to that, but it has a different behavior. I am, sadly, no expert on parsing, so it's entirely possible that there's a better way of getting the same results. |
That's an interesting idea. We'd have to be careful to avoid exponential blowup (if you nest such parsers, even the happy path becomes exponential). Sort of like This is absolutely possible with the extension API I implemented today, and actually quite easy to implement. |
It would be really nice to have |
I too would like it to be public... in time. The reason the current extension API doesn't reveal it is that I'm not even entirely convinced that it's the way we should be going long-term: it might be that another formulation pops up (maybe Rust implements HKTs, or we find a more expressive way to control parsing artefacts). It's this uncertainty that makes me want to avoid making it public, at least for the initial Note that introducing it later would most definitely be possible without breaking changes: an additional method on For what it's worth, you can (with a bit of extra work) already emulate struct MyParser { ... }
impl MyParser {
fn parse_inner<M: MyMode, E>(&self, inp: &mut InputRef<'a, '_, I, E>) -> Result<M::Output<O>, E> {
...
}
}
impl<'a, I, E> ExtParser for MyParser
where
I: Input<'a>,
E: ParserExtra<'a, I>,
{
fn parse(&self, inp: &mut InputRef<'a, '_, I, E>) -> Result<O, E::Error> {
self.parse_inner::<MyEmit>(inp)
}
fn check(&self, inp: &mut InputRef<'a, '_, I, E>) -> Result<(), E::Error> {
self.parse_inner::<MyCheck>(inp)
}
}
trait MyMode { type Output<T>; }
struct MyCheck;
impl MyMode for MyCheck { type Output<T> = (); }
struct MyEmit;
impl MyMode for MyEmit { type Output<T> = T; } If you're implementing a lot of parsers, this 'extra work' isn't particularly enormous when compared to everything else, and can be easily be removed if we do decide to make Apologies if this seems like an unsatisfactory answer from the perspective of a consumer of chumsky. The 1.0 release has already required an enormous amount of work to prepare, and right now my priority is minimising the API surface area in service of releasing a polished core API. This isn't the end of the story though: successive releases will be building upon this minimal central core (in particular, I'm looking forward to a visitor API that allows writing 'static analysis' passes for parsers, such as #295). |
Any pointers on how I'd get started with implementing this for myself? I'm using the latest version straight from GitHub. I'm mainly using it for tasks where a regex-based lexer could also do the job. So at I'll be able to avoid the exponential blowup for the most part. |
If you look at the public API of |
I attempted this, and haven't quite found a good way of doing that. I can query the InputRef.offset, however that struct doesn't implement P.S. As a workaround, I tried constraining the output type, but that makes it incompatible with the pratt parser API for reasons that I don't quite understand yet. |
Oh, we should definitely implement |
I implemented it, hopefully that saves you some time #511 |
Some things me might want to hide:
Mode
Check
/Emit
/etc.Parser::go
/Parser::go_check
/Parser::go_emit
/etc.InputRef
Input
(Input::next_maybe
, for example)Hiding these details does not necessarily make the crate non-extensible! A viable route to recovering extensibility is to have a more simplistic set of traits for common interfaces (like
Parser
,Input
, etc.) that provide a more limited API for which we can guarantee long-term stability, and then provide blanket implementations of the more complex sealed traits for implementers of the simpler traits.The goal here is not to hurt downstream developers that may want to build upon chumsky, but to protect them from changes to volatile implementation details that are likely to happen as we find more ways to aggressively optimise parsers. This has a twofold benefit:
Developers can extend the trait without fear that the ground is suddenly going to shift under them, providing room for an ecosystem to build up around a stable core interface
We, as developers of said core, get the change to aggressively optimise the core and alter the way in which parsers work to provide users with better performance without worrying that we're hurting downstream extensions with regular breaking changes to APIs
How you can help
If you're thinking about building on top of chumsky in the future - perhaps to add new parsers, combinators, errors, etc - then knowing what requirements you have would be enormously helpful! What interfaces would you like to see around backtracking, error generation, token processing, etc.?
The text was updated successfully, but these errors were encountered: