Building a stable and extensible parser API (and hiding the rougher implementation details behind sealed traits) #354

zesterer · 2023-03-07T23:57:50Z

Some things me might want to hide:

Mode
Check/Emit/etc.
Parser::go/Parser::go_check/Parser::go_emit/etc.
InputRef
Most of Input (Input::next_maybe, for example)

Hiding these details does not necessarily make the crate non-extensible! A viable route to recovering extensibility is to have a more simplistic set of traits for common interfaces (like Parser, Input, etc.) that provide a more limited API for which we can guarantee long-term stability, and then provide blanket implementations of the more complex sealed traits for implementers of the simpler traits.

The goal here is not to hurt downstream developers that may want to build upon chumsky, but to protect them from changes to volatile implementation details that are likely to happen as we find more ways to aggressively optimise parsers. This has a twofold benefit:

Developers can extend the trait without fear that the ground is suddenly going to shift under them, providing room for an ecosystem to build up around a stable core interface
We, as developers of said core, get the change to aggressively optimise the core and alter the way in which parsers work to provide users with better performance without worrying that we're hurting downstream extensions with regular breaking changes to APIs

How you can help

If you're thinking about building on top of chumsky in the future - perhaps to add new parsers, combinators, errors, etc - then knowing what requirements you have would be enormously helpful! What interfaces would you like to see around backtracking, error generation, token processing, etc.?

The text was updated successfully, but these errors were encountered:

stefnotch · 2023-03-08T21:46:38Z

One thing that I would find interesting would be a combinator that takes multiple parsers, and "merges" them.
It would apply all parsers to an input, and return whichever one leads to the longest result.

This would be convenient for writing grammars in really simple ways and not having to manually merge them correctly. The boring example would be

Statement → 
  "if" Condition "{" Statement "}" |
  "if" Condition "{" Statement "}" else "{" Statement "}"

While the way more interesting example is parsing decimal and hexadecimal numbers.

Number → 
  ("0" | "1" | "2" | ...)+ |
  "0x" ("0" | "1" | ... | "a" | "b" ...)+

The or combinator is similar to that, but it has a different behavior.

I am, sadly, no expert on parsing, so it's entirely possible that there's a better way of getting the same results.

zesterer · 2023-03-08T22:05:44Z

That's an interesting idea. We'd have to be careful to avoid exponential blowup (if you nest such parsers, even the happy path becomes exponential). Sort of like choice, except the parser that left the input the furthest progressed wins.

This is absolutely possible with the extension API I implemented today, and actually quite easy to implement.

PizzaCrust · 2023-03-19T18:28:02Z

It would be really nice to have Mode publicly available instead of splitting it into two functions.
I don't really want to copy the same code without the output or otherwise leave potential performance on the table.

zesterer · 2023-03-19T20:05:54Z

It would be really nice to have Mode publicly available

I too would like it to be public... in time. The reason the current extension API doesn't reveal it is that I'm not even entirely convinced that it's the way we should be going long-term: it might be that another formulation pops up (maybe Rust implements HKTs, or we find a more expressive way to control parsing artefacts).

It's this uncertainty that makes me want to avoid making it public, at least for the initial 1.0 release: I really want to avoid introducing breaking changes after 1.0 if we can help it so that users have a stable core upon which to build.

Note that introducing it later would most definitely be possible without breaking changes: an additional method on ExtParser with a default implementation that calls out to either ExtParser::parse or ExtParser::check depending on the mode could easily be introduced.

For what it's worth, you can (with a bit of extra work) already emulate Mode yourself, like so:

struct MyParser { ... }

impl MyParser {
    fn parse_inner<M: MyMode, E>(&self, inp: &mut InputRef<'a, '_, I, E>) -> Result<M::Output<O>, E> {
        ...
    }
}

impl<'a, I, E> ExtParser for MyParser
where
    I: Input<'a>,
    E: ParserExtra<'a, I>,
{
    fn parse(&self, inp: &mut InputRef<'a, '_, I, E>) -> Result<O, E::Error> {
        self.parse_inner::<MyEmit>(inp)
    }

    fn check(&self, inp: &mut InputRef<'a, '_, I, E>) -> Result<(), E::Error> {
        self.parse_inner::<MyCheck>(inp)
    }
}

trait MyMode { type Output<T>; }

struct MyCheck;
impl MyMode for MyCheck { type Output<T> = (); }

struct MyEmit;
impl MyMode for MyEmit { type Output<T> = T; }

If you're implementing a lot of parsers, this 'extra work' isn't particularly enormous when compared to everything else, and can be easily be removed if we do decide to make Mode public in a future release.

Apologies if this seems like an unsatisfactory answer from the perspective of a consumer of chumsky. The 1.0 release has already required an enormous amount of work to prepare, and right now my priority is minimising the API surface area in service of releasing a polished core API. This isn't the end of the story though: successive releases will be building upon this minimal central core (in particular, I'm looking forward to a visitor API that allows writing 'static analysis' passes for parsers, such as #295).

stefnotch · 2023-08-20T17:26:43Z

That's an interesting idea. We'd have to be careful to avoid exponential blowup (if you nest such parsers, even the happy path becomes exponential). Sort of like choice, except the parser that left the input the furthest progressed wins.

This is absolutely possible with the extension API I implemented today, and actually quite easy to implement.

Any pointers on how I'd get started with implementing this for myself? I'm using the latest version straight from GitHub.
Edit: Ah, I found it https://docs.rs/chumsky/1.0.0-alpha.4/chumsky/extension/index.html

I'm mainly using it for tasks where a regex-based lexer could also do the job. So at I'll be able to avoid the exponential blowup for the most part.

zesterer · 2023-08-20T18:02:24Z

If you look at the public API of InputRef, you should have everything you need there. You can invoke parsers and do backtracking on each one, choosing the output that left the input offset in the furthest state.

stefnotch · 2023-08-25T12:19:49Z

choosing the output that left the input offset in the furthest state.

I attempted this, and haven't quite found a good way of doing that. I can query the InputRef.offset, however that struct doesn't implement Ord. So I have offsets that cannot be compared.

P.S. As a workaround, I tried constraining the output type, but that makes it incompatible with the pratt parser API for reasons that I don't quite understand yet.

zesterer · 2023-08-26T12:40:15Z

Oh, we should definitely implement Ord for it in that case. I'll try to get time to do this in the next few days.

stefnotch · 2023-08-26T18:01:04Z

I implemented it, hopefully that saves you some time #511

zesterer added 1.0 Features that should be implemented for the 1.0 release help wanted Extra attention is needed api A problem with the design of an API feature labels Mar 7, 2023

zesterer changed the title ~~Hide implementation details within sealed trait(s)~~ Building a stable and extensible parser API (and hiding the rougher implementation details behind sealed traits) Mar 8, 2023

zesterer added the interop An issue that affects interoperability with other crates label Mar 8, 2023

stefnotch mentioned this issue Aug 26, 2023

Implement Ord for Offset struct #511

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building a stable and extensible parser API (and hiding the rougher implementation details behind sealed traits) #354

Building a stable and extensible parser API (and hiding the rougher implementation details behind sealed traits) #354

zesterer commented Mar 7, 2023 •

edited

Loading

stefnotch commented Mar 8, 2023 •

edited

Loading

zesterer commented Mar 8, 2023

PizzaCrust commented Mar 19, 2023 •

edited

Loading

zesterer commented Mar 19, 2023 •

edited

Loading

stefnotch commented Aug 20, 2023 •

edited

Loading

zesterer commented Aug 20, 2023

stefnotch commented Aug 25, 2023

zesterer commented Aug 26, 2023

stefnotch commented Aug 26, 2023 •

edited

Loading

Building a stable and extensible parser API (and hiding the rougher implementation details behind sealed traits) #354

Building a stable and extensible parser API (and hiding the rougher implementation details behind sealed traits) #354

Comments

zesterer commented Mar 7, 2023 • edited Loading

How you can help

stefnotch commented Mar 8, 2023 • edited Loading

zesterer commented Mar 8, 2023

PizzaCrust commented Mar 19, 2023 • edited Loading

zesterer commented Mar 19, 2023 • edited Loading

stefnotch commented Aug 20, 2023 • edited Loading

zesterer commented Aug 20, 2023

stefnotch commented Aug 25, 2023

zesterer commented Aug 26, 2023

stefnotch commented Aug 26, 2023 • edited Loading

zesterer commented Mar 7, 2023 •

edited

Loading

stefnotch commented Mar 8, 2023 •

edited

Loading

PizzaCrust commented Mar 19, 2023 •

edited

Loading

zesterer commented Mar 19, 2023 •

edited

Loading

stefnotch commented Aug 20, 2023 •

edited

Loading

stefnotch commented Aug 26, 2023 •

edited

Loading