Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

one_of, none_of, etc. only work with strings #1510

Open
LoganDark opened this issue Mar 14, 2022 · 2 comments
Open

one_of, none_of, etc. only work with strings #1510

LoganDark opened this issue Mar 14, 2022 · 2 comments
Milestone

Comments

@LoganDark
Copy link
Contributor

I'm trying to parse Lua source code, which is not UTF-8 (strings are just bags of bytes). nom seems to be mostly compatible with &[u8], but I keep finding that random parts of nom randomly fail to typecheck on it.

For example, one_of(b"\\'\"') compiles fine on its own, but you can't use the returned parser on a &[u8] because it expects the element type to be char. This is confusing especially if you're using it as part of an alt, which doesn't point out the one_of(...) as the source of the problem.

nom is quite an arcane library. I appreciate the clearly huge amount of work that has been put into the documentation and implementation, but along with what was noted in #1506 by @Firstyear, a lot of things (like recognize) are still hard to find, and it's not obvious that certain helpers, while they could be generalized, only work on strings, with no alternative for bytes. This is detrimental and makes it very challenging to implement the parser that I want.

Almost everything that has been implemented for strs and chars could be implemented for bytes instead, but there is no separation between the modules and therefore no effort to bring them up to parity.

@Xiretza
Copy link
Contributor

Xiretza commented Mar 14, 2022

Please provide a minimal reproducer of your problem when reporting an issue. There's a reason this is a part of the issue template, which you seem to have ignored entirely.

This works as expected for me on nom 7.1.0:

use nom::{character::complete::one_of, IResult};

fn abc(i: &[u8]) -> IResult<&[u8], char> {
    one_of(b"abc".as_slice())(i)
}

fn main() {
    assert_eq!(abc(b"axe"), Ok((b"xe".as_slice(), 'a')));
    assert_eq!(abc(b"baba"), Ok((b"aba".as_slice(), 'b')));
    assert!(abc(b"foo").is_err());
}

@LoganDark
Copy link
Contributor Author

LoganDark commented Mar 14, 2022

Please provide a minimal reproducer of your problem when reporting an issue. There's a reason this is a part of the issue template, which you seem to have ignored entirely.

Sure buddy

use nom::{Err, IResult};
use nom::branch::alt;
use nom::bytes::complete::{is_not, tag, take_while_m_n};
use nom::character::complete::{alpha1, alphanumeric1};
use nom::character::is_digit;
use nom::combinator::{map, map_opt, opt, recognize, success, value, verify};
use nom::error::{ErrorKind, ParseError};
use nom::multi::{count, many0, many0_count};
use nom::sequence::{delimited, pair, preceded, terminated};

pub fn short_string_escape<'a, E: ParseError<&'a [u8]>>(input: &'a [u8]) -> IResult<&'a [u8], Option<u8>, E> {
	preceded(tag(b"\\"),
		alt((
			//map(short_string_ordinal, Some),
			value(Some(b'\x07'), tag(b"a")),
			value(Some(b'\x7F'), tag(b"b")),
			value(Some(b'\x0C'), tag(b"f")),
			value(Some(b'\n'), tag(b"n")),
			value(Some(b'\r'), tag(b"r")),
			value(Some(b'\t'), tag(b"t")),
			value(Some(b'\x0b'), tag(b"v")),
			map(one_of(b"\\'\"\n".as_slice()), Some),
			success(None)
		))
	)(input)
}

doesn't compile

use nom::{Err, IResult};
use nom::branch::alt;
use nom::bytes::complete::{is_not, tag, take_while_m_n};
use nom::character::complete::{alpha1, alphanumeric1};
use nom::character::is_digit;
use nom::combinator::{map, map_opt, opt, recognize, success, value, verify};
use nom::error::{ErrorKind, ParseError};
use nom::multi::{count, many0, many0_count};
use nom::sequence::{delimited, pair, preceded, terminated};

pub fn one_of_bytes<'a, E: ParseError<&'a [u8]>>(bytes: &'a [u8]) -> impl Fn(&'a [u8]) -> IResult<&'a [u8], u8, E> {
	move |input: &'a [u8]| {
		if let Some(byte) = input.first() {
			if bytes.contains(byte) {
				return Ok((&input[1..], *byte))
			}
		}

		Err(Err::Error(E::from_error_kind(input, ErrorKind::OneOf)))
	}
}

pub fn short_string_escape<'a, E: ParseError<&'a [u8]>>(input: &'a [u8]) -> IResult<&'a [u8], Option<u8>, E> {
	preceded(tag(b"\\"),
		alt((
			//map(short_string_ordinal, Some),
			value(Some(b'\x07'), tag(b"a")),
			value(Some(b'\x7F'), tag(b"b")),
			value(Some(b'\x0C'), tag(b"f")),
			value(Some(b'\n'), tag(b"n")),
			value(Some(b'\r'), tag(b"r")),
			value(Some(b'\t'), tag(b"t")),
			value(Some(b'\x0b'), tag(b"v")),
			map(one_of_bytes(b"\\'\"\n"), Some),
			success(None)
		))
	)(input)
}

compiles

(Sorry for the lack of imports, there are a LOT of them in this file and sorting through them would take a while)
Nah just added all of them since it should be fine

@Geal Geal added this to the 8.0 milestone Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants