-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistencies in floating point parsing #370
Comments
Thank you for the report @d86leader! I've done a bit of investigation to see why this is happening:
float = "inf" | "-inf" | "NaN" | float_num;
float_num = ["+" | "-"], (float_std | float_frac | float_int);
float_std = digit, { digit }, ".", {digit}, [float_exp];
float_frac = ".", digit, {digit}, [float_exp];
float_int = digit, { digit }, [float_exp];
float_exp = ("e" | "E"), ["+" | "-"], digit, {digit};
It seems like at the moment the best strategy would be to (a) fix the grammar and (b) add some tests to ensure that the grammar represents the implemented behaviour. To improve the errors, we could also allow the parser to eat up underscores when checking for a float but then add a descriptive error message saying that we cannot parse them at the moment. Another possibility would of course be to parse and ignore them, which would allow us to parse every valid float literal in Eventually, it would of course be great if |
Ok, here is the EBNF from Float ::= Sign? ( 'inf' | 'NaN' | Number )
Number ::= ( Digit+ |
Digit+ '.' Digit* |
Digit* '.' Digit+ ) Exp?
Exp ::= [eE] Sign? Digit+
Sign ::= [+-]
Digit ::= [0-9] |
And here is FLOAT_LITERAL ::=
DEC_LITERAL .
| DEC_LITERAL FLOAT_EXPONENT
| DEC_LITERAL . DEC_LITERAL FLOAT_EXPONENT?
FLOAT_EXPONENT ::=
(e|E) (+|-)? (DEC_DIGIT|_)* DEC_DIGIT (DEC_DIGIT|_)*
DEC_DIGIT ::= [0-9]
DEC_LITERAL ::= DEC_DIGIT (DEC_DIGIT|_)* Rust doesn't support a leading |
Ok, my proposal would now be to (1) adapt |
So this would be the new EBNF: float = ["+" | "-"], ("inf" | "NaN" | float_num);
float_num = (float_int | float_std | float_frac), [float_exp];
float_int = digit, { digit };
float_std = digit, { digit }, ".", {digit};
float_frac = ".", digit, {digit};
float_exp = ("e" | "E"), ["+" | "-"], digit, {digit}; |
Looking at this EBNF, I thought of another funny edge case: the ambiguity of NaN and inf between floats and keywords. The parser seems to resolve this ambiguity by seeing what the type itself requires: type R<T> = ron::Result<T>;
#[derive(Debug, serde::Deserialize, serde::Serialize)]
struct NaN;
fn main() {
let s = "NaN";
let x1: R<NaN> = ron::from_str(s);
let x2: R<f64> = ron::from_str(s);
println!("{:?}\n{:?}", x1, x2);
// >>> Ok(NaN)
// Ok(NaN)
} |
Yes, that is correct. Since RON is not really self-describing, it will trust you in this case. Can we confuse it - yes we can: #[derive(serde::Deserialize)]
struct NaN;
#[derive(serde::Deserialize)]
#[serde(untagged)]
enum Trolol {
Lol(NaN),
F64(f64),
}
#[test]
fn test() {
assert!(matches!(ron::from_str("NaN"), Ok(Trolol::F64(f)) if f.is_nan()));
} |
@d86leader Could you check if the PR would resolve your issues? Thanks! |
Thanks! The new grammar looks good and seems to reflect the implementation |
Grammar taken from current master. There are a couple inconcistencies between the grammar, the current implementation, and from how rust does it.
Exponential notation. The grammar allows
1.0e1
, but disallows1e1
and bizzarely disallows sign:1.0e+1
and1.0e-1
. But the implementation accepts all those forms, and so does rust.Underscores are disallowed in floats, and this is consistent in both the grammar and the implementation; but it differs from rust and is a bit unexpected.
Here's small repro for everything:
The text was updated successfully, but these errors were encountered: