-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update streaming input
and input type
columns
#48
base: main
Are you sure you want to change the base?
Conversation
| [yap] | combinators | in source | library | `&str`, `&[T]`, custom | No | Yes | ? | | ||
| crate | parser type | action code | integration | input type | precedence climbing | parameterized rules | streaming input | | ||
|-----------|-------------|-------------|--------------------|-------------------------|---------------------|---------------------|-------------------------------------------------------------------------| | ||
| [chumsky] | combinators | in source | library | `&str`, `&[T]`, custom | ? | ? | [Yes](https://docs.rs/chumsky/latest/chumsky/stream/struct.Stream.html) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"streaming input" means "can it handle operating on a partial/incomplete input"
- For chumksy, that looks to be the equivalent of our
Stream
trait. I'm not seeing anything about partial / incomplete input - For combine, I think this is the more proper link but ... there is no documentation on the topic
- For nom, I guess that is the best that can be done? There really isn't a good resource on it
- For winnow, the link should go to https://docs.rs/winnow/latest/winnow/stream/struct.Partial.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is sufficient for a parser to be able to make progress with partial input to support "streaming input".
As an extreme example:
- Assume you get 1 new token / minute
- It takes 1 minute to process a token.
- You parse 100 tokens.
Then a streaming parser will be finished at minute 101.
A non-streaming parser will have to wait 100 minutes for all the tokens then spend another 100 minutes on processing resulting in 200 minutes to finish parsing.
It seems there are two ways of doing this.
- External partial state (what
nom
andwinnow
use with theirIncomplete
error variants) where a parser takes a partial input and maybe a partial state and returns the new partial or complete state. - Internal partial state (what some of the others use) where a parser repeatedly takes partial input and holds a partial state only returning once some part of final state is generated.
If you can parse a token iterator you can parse a stream. You write your input stream as an iterator.
For chumksy
, you can use an iterator over the stream of input using the from_iter
method.
I'm not super familiar with combine
, but the link you posted seems to be a newtype to signal a certain behavior when reaching the end of input. This is not necessary for parsing a stream because you don't need to reach the end of the stream until you have all the tokens. Write your stream to block or await for available tokens and your parser doesn't need to know.
For example in yap_streaming
fizzbuzz example new tokens can take an unbounded amount of time, but the parser can process all tokens it has so far received without ever knowing that it waited for input.
I think the combine
link is correct because the options available at that page seem to be how one would handle different kinds of streams.
I have no problem changing the winnow
link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For chumksy, you can use an iterator over the stream of input using the from_iter method.
While that does use an iterator, without seeing an example showing this use case, I question how it would work. For example, how do you handle the end span?
I think the combine link is correct because the options available at that page seem to be how one would handle different kinds of streams.
This all seems very handy wavy guesses as to how its supposed to work and without verified examples, who knows if all of the practical aspects are taken care of.
And examples only help in calling attention to it and not fully resolving it. For example, I commented in the issue about IO error handling for yap_streaming
but also blocking in the parser could end up with serious ramifications for an application.
I am curious, how do you know when you can stop keeping state for backtracking? Is a marker made for the outer most backtracking and as you unwind past it, you free it, allowing the buffer to be reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While that does use an iterator, without seeing an example showing this use case, I question how it would work. For example, how do you handle the end span?
If I have time I'll see if I can setup an example. Maybe I'll prove myself wrong, though I don't see what would prevent parsing a Reader::bytes()
iterator.
I'm not sure what you mean by handling the end span. Maybe its related to how chumsky
handles the None
case here with the eoi
span?
pub(crate) fn next(&mut self) -> (usize, S, Option<I>) {
match self.pull_until(self.offset).cloned() {
Some((out, span)) => {
self.offset += 1;
(self.offset - 1, span, Some(out))
}
None => (self.offset, self.eoi.clone(), None),
}
}
This all seems very handy wavy guesses as to how its supposed to work and without verified examples, who knows if all of the practical aspects are taken care of.
I'll file an issue on combine
after we agree on a definition for "streaming input". We might already? Just double-checking.
And examples only help in calling attention to it and not fully resolving it. For example, I commented in the issue about IO error handling for yap_streaming but also blocking in the parser could end up with serious ramifications for an application.
I'll comment there on handling IO and blocking. I'm not sure what the comment on examples fully means.
I am curious, how do you know when you can stop keeping state for backtracking? Is a marker made for the outer most backtracking and as you unwind past it, you free it, allowing the buffer to be reused?
I can't speak to how chumsky
does it. In yap_streaming
backtracking can only occur with TokenLocation
so creating one adds the current offset to a list and removed from the list when dropped. Items are only copied to the buffer if a TokenLocation
exists which might need it when a reset occurs. Items are only dropped from the buffer if the oldest TokenLocation
in the list is younger.
Fixed some missing and incorrect information in table.