Improve type safety of serialize::json::Encoder #19249

barosl · 2014-11-23T19:04:31Z

This pull request tries to improve type safety of serialize::json::Encoder.

Looking at #18319, I decided to test some JSON implementations in other languages. The results are as follows:

Encoding to JSON

Language	111111111111111111	1.0
JavaScript™	"111111111111111100"	"1"
Python	"111111111111111111"	"1.0"
Go	"111111111111111111"	"1"
Haskell	"111111111111111111"	"1"
Rust	"111111111111111104"	"1"

Decoding from JSON

Language	"1"	"1.0"	"1.6"
JavaScript™	1 (Number)	1 (Number)	1.6 (Number)
Python	1 (int)	1.0 (float)	1.6 (float)
Go	1 (float64)	1 (float64)	1.6 (float64)
Go (expecting `int`)	1 (int)	error	error
Haskell (with `:: Int`)	1 (Int)	1 (Int)	2 (Int)
Haskell (with `:: Double`)	1.0 (Double)	1.0 (Double)	1.6 (Double)
Rust (with `::<int>`)	1 (int)	1 (Int)	1 (Int)
Rust (with `::<f64>`)	1 (f64)	1 (f64)	1.6 (f64)

The tests on Haskell were done using the json package.
The error message printed by Go was: cannot unmarshal number 1.0 into Go value of type int

As you see, there is no uniform behavior. Every implementation follows its own principle. So I think it is reasonable to find a desirable set of behaviors for Rust.

Firstly, every implementation except the one for JavaScript is capable of handling i64 values. It is even practical, because Twitter API uses an i64 number to represent a tweet ID, although it is recommended to use the string version of the ID.

Secondly, looking into the Go's behavior, implicit type conversion is not allowed in their decoder. If the user expects an integer value to follow, decoding a float value will raise an error. This behavior is desirable in Rust, because we are pleased to follow the principles of strong typing.

Thirdly, Python's JSON module forces a decimal point to be printed even if the fractional part does not exist. This eases the distinction of a float value from an integer value in JSON, because by the spec there is only one type to represent numbers, Number.

So, I suggest the following three breaking changes:

Remove float preprocessing in serialize::json::Encoder

serialize::json::Encoder currently uses f64 to emit any integral type. This is possibly due to the behavior of JavaScript, which uses f64 to represent any numeric value.

This leads to a problem that only the integers in the range of [-2^53+1, 2^53-1] can be encoded. Therefore, i64 and u64 cannot be used reliably in the current implementation.

RFC 7159 suggests that good interoperability can be achieved if the range is respected by implementations. However, it also says that implementations are allowed to set the range of number accepted. And it seems that the JSON encoders outside of the JavaScript world usually make use of i64 values.

This commit removes the float preprocessing done in the emit_* methods. It also increases performance, because transforming f64 into String costs more than that of an integral type.

Fixes JSON encoder silently corrupts some 64-bit ints #18319
Do not coerce to integer when decoding a float value

When an integral value is expected by the user but a fractional value is found, the current implementation uses std::num::cast() to coerce to an integer type, losing the fractional part. This behavior is not desirable because the number loses precision without notice.

This commit makes it raise ExpectedError when such a situation arises.
Always use a decimal point when emitting a float value

JSON doesn't distinguish between integer and float. They are just numbers. Also, in the current implementation, a fractional number without the fractional part is encoded without a decimal point.

Thereforce, when the value is decoded, it is first rendered as Json, either I64 or U64. This reduces type safety, because while the original intention was to cast the value to float, it can also be casted to integer.

As a workaround of this problem, this commit makes the encoder always emit a decimal point even if it is not necessary. If the fractional part of a float number is zero, ".0" is padded to the end of the result.

huonw · 2014-11-25T09:01:24Z

cc @erickt, I vaguely recall you had done something like this previously?

alexcrichton · 2014-11-30T06:35:52Z

This all seems reasonable to me, could you rebase this @barosl? Thanks!

barosl · 2014-12-02T06:05:38Z

Rebased to the current master.

@alexcrichton r?

alexcrichton · 2014-12-02T07:53:26Z

src/libserialize/json.rs

@@ -178,7 +178,7 @@
 //!     // Serialize using `ToJson`
 //!     let input_data = TestStruct {
 //!         data_int: 1,
-//!         data_str: "toto".to_string(),
+//!         data_str: "madoka".into_string(),


For example code, could these actually stay as to_string? The call to to_string is generally more idiomatic and especially for a string literal there's no difference between to_string and into_string.

Uhm, but I saw that "".to_string() is slower than String::new() at #18404 and #16415. Was this fixed?

Oh, it seems we got a better formatter. Now the difference is negligible. Great, I'll remove the commit.

let a = "hello".to_string().capacity(); let b = "hello".into_string().capacity(); println!("{} {}", a, b); 8 5

serialize::json::Encoder currently uses f64 to emit any integral type. This is possibly due to the behavior of JavaScript, which uses f64 to represent any numeric value. This leads to a problem that only the integers in the range of [-2^53+1, 2^53-1] can be encoded. Therefore, i64 and u64 cannot be used reliably in the current implementation. RFC 7159 suggests that good interoperability can be achieved if the range is respected by implementations. However, it also says that implementations are allowed to set the range of number accepted. And it seems that the JSON encoders outside of the JavaScript world usually make use of i64 values. This commit removes the float preprocessing done in the emit_* methods. It also increases performance, because transforming f64 into String costs more than that of an integral type. Fixes rust-lang#18319 [breaking-change]

When an integral value is expected by the user but a fractional value is found, the current implementation uses std::num::cast() to coerce to an integer type, losing the fractional part. This behavior is not desirable because the number loses precision without notice. This commit makes it raise ExpectedError when such a situation arises. [breaking-change]

JSON doesn't distinguish between integer and float. They are just numbers. Also, in the current implementation, a fractional number without the fractional part is encoded without a decimal point. Thereforce, when the value is decoded, it is first rendered as Json, either I64 or U64. This reduces type safety, because while the original intention was to cast the value to float, it can also be casted to integer. As a workaround of this problem, this commit makes the encoder always emit a decimal point even if it is not necessary. If the fractional part of a float number is zero, ".0" is padded to the end of the result. [breaking-change]

Except for the example code!

alexcrichton · 2014-12-08T19:40:16Z

@barosl this seems to still reference into_string, can that be removed?

barosl · 2014-12-08T23:29:08Z

@alexcrichton I thought into_string() should be only prohibited from the example code, shouldn't it? If I should remove the to_string() -> into_string() commit entirely, I will follow it. In that case, I also have to modify some earlier commits that use into_string() to make a string literal.

barosl · 2014-12-08T23:40:23Z

And also note that into_string() is currently used in the existing json.rs code. Should these be removed together?

alexcrichton · 2014-12-09T04:00:19Z

Oops I need to read more closely next time, thanks @barosl!

This pull request tries to improve type safety of `serialize::json::Encoder`. Looking at #18319, I decided to test some JSON implementations in other languages. The results are as follows: * Encoding to JSON | Language | 111111111111111111 | 1.0 | | --- | --- | --- | | JavaScript™ | "111111111111111100" | "1" | | Python | "111111111111111111" | **"1.0"** | | Go | "111111111111111111" | "1" | | Haskell | "111111111111111111" | "1" | | Rust | **"111111111111111104"** | "1" | * Decoding from JSON | Language | "1" | "1.0" | "1.6" | | --- | --- | --- | --- | | JavaScript™ | 1 (Number) | 1 (Number) | 1.6 (Number) | | Python | 1 (int) | 1.0 (float) | 1.6 (float) | | Go | **1 (float64)** | 1 (float64) | 1.6 (float64) | | Go (expecting `int`) | 1 (int) | **error** | error | | Haskell (with `:: Int`) | 1 (Int) | 1 (Int) | **2 (Int)** | | Haskell (with `:: Double`) | 1.0 (Double) | 1.0 (Double) | 1.6 (Double) | | Rust (with `::<int>`) | 1 (int) | 1 (Int) | **1 (Int)** | | Rust (with `::<f64>`) | 1 (f64) | 1 (f64) | 1.6 (f64) | * The tests on Haskell were done using the [json](http://hackage.haskell.org/package/json) package. * The error message printed by Go was: `cannot unmarshal number 1.0 into Go value of type int` As you see, there is no uniform behavior. Every implementation follows its own principle. So I think it is reasonable to find a desirable set of behaviors for Rust. Firstly, every implementation except the one for JavaScript is capable of handling `i64` values. It is even practical, because [Twitter API uses an i64 number to represent a tweet ID](https://dev.twitter.com/overview/api/twitter-ids-json-and-snowflake), although it is recommended to use the string version of the ID. Secondly, looking into the Go's behavior, implicit type conversion is not allowed in their decoder. If the user expects an integer value to follow, decoding a float value will raise an error. This behavior is desirable in Rust, because we are pleased to follow the principles of strong typing. Thirdly, Python's JSON module forces a decimal point to be printed even if the fractional part does not exist. This eases the distinction of a float value from an integer value in JSON, because by the spec there is only one type to represent numbers, `Number`. So, I suggest the following three breaking changes: 1. Remove float preprocessing in serialize::json::Encoder `serialize::json::Encoder` currently uses `f64` to emit any integral type. This is possibly due to the behavior of JavaScript, which uses `f64` to represent any numeric value. This leads to a problem that only the integers in the range of [-2^53+1, 2^53-1] can be encoded. Therefore, `i64` and `u64` cannot be used reliably in the current implementation. [RFC 7159](http://tools.ietf.org/html/rfc7159) suggests that good interoperability can be achieved if the range is respected by implementations. However, it also says that implementations are allowed to set the range of number accepted. And it seems that the JSON encoders outside of the JavaScript world usually make use of `i64` values. This commit removes the float preprocessing done in the `emit_*` methods. It also increases performance, because transforming `f64` into String costs more than that of an integral type. Fixes #18319 2. Do not coerce to integer when decoding a float value When an integral value is expected by the user but a fractional value is found, the current implementation uses `std::num::cast()` to coerce to an integer type, losing the fractional part. This behavior is not desirable because the number loses precision without notice. This commit makes it raise `ExpectedError` when such a situation arises. 3. Always use a decimal point when emitting a float value JSON doesn't distinguish between integer and float. They are just numbers. Also, in the current implementation, a fractional number without the fractional part is encoded without a decimal point. Thereforce, when the value is decoded, it is first rendered as `Json`, either `I64` or `U64`. This reduces type safety, because while the original intention was to cast the value to float, it can also be casted to integer. As a workaround of this problem, this commit makes the encoder always emit a decimal point even if it is not necessary. If the fractional part of a float number is zero, ".0" is padded to the end of the result.

barosl force-pushed the json-type-safety branch from 2fd82c5 to 78c3553 Compare November 23, 2014 20:58

barosl force-pushed the json-type-safety branch 3 times, most recently from dee3a10 to c9fa164 Compare December 2, 2014 06:01

alexcrichton reviewed Dec 2, 2014
View reviewed changes

barosl force-pushed the json-type-safety branch from c9fa164 to a746e88 Compare December 8, 2014 08:56

barosl added 3 commits December 8, 2014 18:02

barosl force-pushed the json-type-safety branch from a746e88 to db6946f Compare December 8, 2014 09:07

barosl added 2 commits December 8, 2014 18:19

libserialize: Code cleanup

c32286d

libserialize: Prefer into_string() to to_string() wherever possible

7176dd1

Except for the example code!

barosl force-pushed the json-type-safety branch from db6946f to 7176dd1 Compare December 8, 2014 09:19

bors closed this Dec 9, 2014

bors merged commit 7176dd1 into rust-lang:master Dec 9, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve type safety of serialize::json::Encoder #19249

Improve type safety of serialize::json::Encoder #19249

barosl commented Nov 23, 2014

huonw commented Nov 25, 2014

alexcrichton commented Nov 30, 2014

barosl commented Dec 2, 2014

alexcrichton Dec 2, 2014

barosl Dec 2, 2014

barosl Dec 2, 2014

alexcrichton commented Dec 8, 2014

barosl commented Dec 8, 2014

barosl commented Dec 8, 2014

alexcrichton commented Dec 9, 2014

Improve type safety of serialize::json::Encoder #19249

Improve type safety of serialize::json::Encoder #19249

Conversation

barosl commented Nov 23, 2014

huonw commented Nov 25, 2014

alexcrichton commented Nov 30, 2014

barosl commented Dec 2, 2014

alexcrichton Dec 2, 2014

Choose a reason for hiding this comment

barosl Dec 2, 2014

Choose a reason for hiding this comment

barosl Dec 2, 2014

Choose a reason for hiding this comment

alexcrichton commented Dec 8, 2014

barosl commented Dec 8, 2014

barosl commented Dec 8, 2014

alexcrichton commented Dec 9, 2014