-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Challenges with using nom #1506
Comments
I agree that there are a lot of opportunities to improve the documentation - I think pull requests in this area are always welcome! |
And I would certainly be happy to help write them except ... the barrier here is that without the documentation I don't understand all the modules so I ... can't write the documentation. I would be more than happy to be a docs reviewer though! |
A good example of this is actually the problem I'm hitting right now. I'm trying to parse the following tagged value from a byte array into a u32 So I can use tag combined with take_until to get:
And here is where it becomes a mess. I can use tag to get the "$" as the preceeding char, but trying to wrap that in map res as demonstrated in the recipes: https://docs.rs/nom/latest/nom/recipes/index.html#hexadecimal Well, u32::from_str_radix only takes a str, not u8, meaning we have to do some unsafe.
Ahh but that won't work because map_res into out only gives what matched on the tag not the remainder of the input, so it's trying to convert the "$" we matched. Nothing in combinator lets you say "if you matched on A, then map the remainder. Try to break that out and we get:
This also won't work, because whatever magic map_res was doing to convert ParseIntError to a nom error isn't there. Trying to read https://docs.rs/nom/latest/nom/error/index.html and I can't work out how to map this error into something nom will be happy with. I could try to use this:
But once again, you can not because
Perhaps what I need is https://docs.rs/nom/latest/nom/sequence/fn.preceded.html but then I'm back to being unable to work out the magic to make the ParseIntError into something that nom is happy with. This is such a perfect example of the kind of challenges that are commonly encountered with nom. Something that looks like it should be a trivial and simple task, ends up turning into hours of frustration as you can't work out which exact set of pieces you need to assemble together to make it work :( EDIT: When I did try and make my own error:
And I have zero clue how to resolve this :) |
I'm having a hard time following your comment, and github issues aren't really the best medium to figure things like this out anyway - but if you want to join the gitter room (also possible through matrix) maybe we can figure this out a bit more interactively. |
@Xiretza as much as I appreciate the offer to assist with this specific technical problem, this issue report is meant to show a broader and larger problem - that nom, while being a really powerful, well built library, has extreme challenges with adoption and accessibility for new users. If I, a senior software engineer, with more than 10 years of experience, struggle to use this library from it's documentation, then how do many other people feel every day when they try? How many people have given up in silence and moved on to something else? So again, my point here is "how can nom be improved so that a user can learn how to use it, and how can they find the pieces they need to be effective"? |
I don't see how this ticket could prove to be helpful to anyone. The ticket doesn't present a concrete problem that can be fixed. In the opening comment, the author complains about not understanding the library due to lack of documentation. They then proceed to give a fairly in-depth walk-through of exactly how they would help improve the documentation. When advised that they could open a PR, they declined. Even after that (and posting a huge wall of text and code), they were offered direct help. They declined that as well. |
@dark-fusion I can't write documentation for something when I don't know how it works. I can't "magically" create that knowledge from nothing. It is completely valid for me to provide suggestions like this that can assist the authors who DO have the subject matter knowledge to that they can resolve the problem both for myself and others. |
use nom::{
bytes::complete::tag,
// Use this instead of nom::number::u8 because you're going from ASCII characters
// to numbers and not bytes to numbers (e.g. \u{0005}). IMO the documentation doesn't
// make this distinction readily apparent. Giving functions the same name as integral types
// is unnecessarily confusing.
character::complete::u8,
multi::length_data,
sequence::{delimited, terminated},
IResult,
};
fn parze(input: &[u8]) -> IResult<&[u8], &[u8]> {
// Use terminated to elegantly consume the trailing \r\n
terminated(
// length_data will return N bytes where the first argument is N
length_data(
// Use delimited for the same reasons as terminated. Ultimately
// what's passed into length_data is the result of the u8 combinator.
delimited(
// Our prefix. This sh/could be put into its own function for ease of reuse.
tag(b"$"),
// This is the value we want, replace u8 with whatever size
// e.g. nom::character::complete::u32 if you want to handle large numbers.
// for now u8 returns the length that length_data will use.
u8,
// This consumes the trailing "\r\n". If any newline-esque
// set of characters is appropriate there is a newline combinator.
tag(b"\r\n"),
)
),
tag("\r\n")
)(input)
}
#[cfg(test)]
mod test {
use super::parze;
#[test]
fn test_parse() {
let input = b"$5\r\nvalue\r\n";
// Reassign input if you want to continue parsing
let (_, value) = parze(input).expect("Don't use unwrap");
assert_eq!(b"value", value);
// Convert to a &str if you're dealing with UTF-8 and want to use
// string/str methods
let value = std::str::from_utf8(value).expect("Handle your UTF-8 errors here");
assert_eq!("value", value);
}
} Alternatively: fn parze(input: &[u8]) -> IResult<&[u8], &[u8]> {
// Return the error with the ? operator here because we want to keep processing
let (input, _) = tag(b"$")(input)?;
let (input, length) = u8(input)?;
let (input, _) = tag(b"\r\n")(input)?;
// Don't use the ? operator because this is our last parser
// Alternatively reassign input as above for take and then tag
// and then return the produced value for take
terminated(
nom::bytes::complete::take(length),
tag(b"\r\n"),
)(input)
} First of all don't use unsafe when parsing UTF-8 data. Either work with Mostly it seems like what's not being communicated effectively is how to deal with the remaining input. It takes some practice to wrap your head around that for sure, but hopefully the second example demonstrates how to do this. Error handling is tricky with nom, but since most/all of what you're doing should be covered by the stock combinators you shouldn't have to dig into that. Insofar as // The first type for IResult is the input, the second the output
fn parze(input: &[u8]) -> IResult<&[u8], &str> {
map_res(
terminated(
length_data(
delimited(
tag(b"$"),
u8,
tag(b"\r\n"),
)
),
tag("\r\n")
),
// from_utf8 returns a Result. Using nom's map_err will transform the error case into a nom error
// and leave the resulting &str in the okay case.
std::str::from_utf8
)(input)
} This is fine-ish: let strln = map_res(
nom::character::complete::digit1,
|out: &str| {
debug!("{:?}", out);
u32::from_str_radix(out, 10)
}
)(a)?; The non-obvious thing here is that you've not destructured the tuple so strln will be (&str, u32). The problem is that there's not enough context to make the appropriate type inferences (and this will rear its ugly head with all sorts of rust code). The solution is to package things up a bit more cleanly. For example this compiles (however unless you need only one digit I'd just use the appropriate u8/u16/u32 combinator): fn parze_u32(input: &str) -> IResult<&str, u32> {
map_res(
nom::character::complete::digit1,
|out: &str| u32::from_str_radix(out, 10)
)(input)
} |
imo its a fairly high quality experience report that both gives us a mindset for failure points, reflects on the failure points, and provides suggestions for each, even if they aren't what should necessarily be done.
That isn't the point, the point is the experience of the user. |
Some quick thoughts on the original issue
This shows people assume See #1416 for a similar issue helping to make other combinators more discoverable.
I'm not sold on completely flattening but there are areas to improve here.
|
Agreed. The list of combinators went a long way towards figuring out where I should be looking, however it's incomplete and outdated in some places. More recently I went looking for the regexp combinators and couldn't figure out where they went. It looks like there's a nom-regex crate, but no repo for it? |
Another one that gets me: parsing a number from text. If I want to parse "12.34" to an f64 I would use Edit: OH yeah and the list, while great, is incomplete. For e.g. it's missing the |
- integers are parsed in `character` with their rust type names - floats are parsed in `number` with their c typ names This discrepancy was pointed out in rust-bakery/nom#1506. This moves floats to be in `character` and to use their rust type names
- integers are parsed in `character` with their rust type names - floats are parsed in `number` with their c typ names This discrepancy was pointed out in rust-bakery/nom#1506. This moves floats to be in `character` and to use their rust type names
Hey mate,
Nom is a really powerful library and when it works well, it can do amazing things. But sometimes it can be extremely hard to find which of the many pieces of nom you need to use, and how to combine them. Even I have sat staring at the docs for hours, feeling ultimately very stupid, wondering at what point did my life choices go so wrong.
Anyway, generally when I get to this point, I always ask "what was the interaction failure that led to this feeling". Here in nom, I think there is a combination of things, but the docs are a really large part.
For example, I was wondering "how do I assert there are no bytes left in an input". So, since this is "bytes" and I have all the input, I went to:
https://docs.rs/nom/latest/nom/bytes/complete/index.html
But there is nothing here that does what I needed. Searching for "complete", "end", "size", "limit", did not find any results.
At this point I gave up and did something else, and when working on a different part of the code with a seperate nom parser I found: https://docs.rs/nom/latest/nom/combinator/fn.eof.html
So what was the "failure" here?
The primary issue is that it was hard to find and discover what it is that I needed. Often it's extremely difficult to locate what piece of nom you need, because the library being fragmented over so many modules, means that you effectively have ~9 different modules you need to look through. This makes it very difficult to locate combinators, and then effectively identify which one you may wish to use.
A possible solution is to "flatten" all the combinators to a single module, so that they can all be listed on a single page with the first line of their docs. This also has the benefit that every combinator needs to then be uniquely named, which helps to potentially resolve conflicts and confusion such as complete and streaming bytes. Alternatelly, this "flattened" module, could be made from pub-use of the other modules with aliases. Another benefit to this could be aliases to some combinators to help them be discovered easier.
Another barrier is that the docs of any parser can be extremely terse. Here's flat map for example:
https://docs.rs/nom/latest/nom/combinator/fn.flat_map.html
It took me multiple attempts to read this to understand what this combinator did, and how or why I would use it. The type signature is huge and has 8 generics. Even reading that it's pretty hard to see what was going on. The doc is a single line, that doesn't really explain the use case. And the example has no comments to indicate "what" that combinator is doing.
In otherwords, the current docs has basically a single line of (terse) human text to explain the type signature, and a single example that you are expected to reverse engineer and interpret. Especially for myself who failed mathematics and formal types and all that jazz, this makes it extremely hard for me to understand "how" or "why" I might care about flat_map!
I think a constructive improvement to this is that any combinators docs should have:
An example where this is done pretty well is "alt" https://docs.rs/nom/latest/nom/branch/fn.alt.html which has a better example and does a lot of what I just suggested.
An example of where this is done poorly is https://docs.rs/nom/latest/nom/error/fn.make_error.html where there are no examples of how to make an error, and I tried for about 45 minutes to work out what to do here, before giving up.
So I hope some of these thoughts help improve the docs and nom in general. It's a powerful library but I think that it needs to show that through it's documentation to make it more accessible, with different types of language for people to use :)
The text was updated successfully, but these errors were encountered: