Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rusty byte strings in RON, deprecate base64 (byte) strings #438

Merged
merged 25 commits into from
Sep 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
54d8d73
Switch from base64 to rusty byte strings, deprecate base64 support
juntyr Feb 6, 2023
9cbdb9e
Document the deprecated_base64_byte_string extension and add the Valu…
juntyr Feb 7, 2023
b88dcc1
Extend Value tests for Value::String and Value::Bytes
juntyr Feb 7, 2023
4ed6186
Include byte strings in the RON grammar
juntyr Feb 7, 2023
9f52a7f
Fix ASCII escape decoding for strings and byte strings
juntyr Feb 7, 2023
7fb5f68
Fix byte string error display for #462 test
juntyr Aug 18, 2023
e2f3c4a
Simplify the deprecated API surface: only Rusty ser, allow base64 de …
juntyr Aug 18, 2023
e1182df
Fix byte string error test
juntyr Aug 18, 2023
0e372f6
Add a CHANGELOG entry
juntyr Aug 18, 2023
f9a059d
Added a deprecation error test for v0.10
juntyr Aug 18, 2023
29558e5
Add tests for v0.9 optional base64 byte string support
juntyr Aug 18, 2023
48be462
Add an example for using base64-encoded bytes with ron
juntyr Aug 18, 2023
d082334
Fix formatting in README
juntyr Aug 18, 2023
be96356
Remove outdated extension docs
juntyr Aug 18, 2023
d09a1dd
Add tests for unescaped and raw byte strings
juntyr Aug 18, 2023
a9367e2
Fix fuzzer-found issue with serialising invalid UTF-8 byte strings
juntyr Aug 20, 2023
c8411f5
Fix fuzzer found issue with `br#` being parsed as the identifier `br`
juntyr Aug 20, 2023
094f7ce
Fix parsing of byte escapes in UTF-8 strings to produce proper Unicod…
juntyr Aug 24, 2023
8cbfeb5
Fix fuzzer-found interaction with unwrap_variant_newtypes
juntyr Aug 24, 2023
500dd04
Add support for strongly typed byte literals
juntyr Aug 25, 2023
9d96262
Add missing Value serialising tests
juntyr Aug 25, 2023
a073fe4
Add test to show that #436 is solved with strongly typed base64 user-…
juntyr Aug 25, 2023
50c6a97
Small progress in increasing patch coverage, four uncovered missing
juntyr Sep 1, 2023
9e10d96
Add more coverage tests
juntyr Sep 1, 2023
fbe71e1
Fix final missing patch coverage
juntyr Sep 1, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Allow `ron::value::RawValue` to capture any whitespace to the left and right of a ron value ([#487](https://github.com/ron-rs/ron/pull/487))
- Fix serialising reserved identifiers `true`, `false`, `Some`, `None`, `inf`[`f32`|`f64`], and `Nan`[`f32`|`f64`] ([#487](https://github.com/ron-rs/ron/pull/487))
- Disallow unclosed line comments at the end of `ron::value::RawValue` ([#489](https://github.com/ron-rs/ron/pull/489))
- **Format-Breaking:** Switch from base64-encoded to Rusty byte strings, still allow base64 deserialising for now ([#438](https://github.com/ron-rs/ron/pull/438))
- Add support for byte literals as strongly typed unsigned 8-bit integers ([#438](https://github.com/ron-rs/ron/pull/438))

## [0.8.1] - 2023-08-17

Expand Down
2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ default = []
integer128 = []

[dependencies]
# FIXME @juntyr remove base64 once old byte strings are fully deprecated
base64 = "0.21"
bitflags = { version = "2.0", features = ["serde"] }
indexmap = { version = "2.0", features = ["serde"], optional = true }
Expand All @@ -37,3 +38,4 @@ serde_bytes = "0.11"
serde_json = "1.0"
option_set = "0.2"
typetag = "0.2"
bytes = { version = "1.3", features = ["serde"] }
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ While data structures with any of these attributes should roundtrip through RON,

* Numbers: `42`, `3.14`, `0xFF`, `0b0110`
* Strings: `"Hello"`, `"with\\escapes\n"`, `r#"raw string, great for regex\."#`
* Byte Strings: `b"Hello"`, `b"with \x65\x73\x63\x61\x70\x65\x73\n"`, `br#"raw, too"#`
* Booleans: `true`, `false`
* Chars: `'e'`, `'\n'`
* Optionals: `Some("string")`, `Some(Some(1.34))`, `None`
Expand Down
30 changes: 27 additions & 3 deletions docs/grammar.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ For the extension names see the [`extensions.md`][exts] document.
## Value

```ebnf
value = integer | float | string | char | bool | option | list | map | tuple | struct | enum_variant;
value = integer | byte | float | string | byte_string | char | bool | option | list | map | tuple | struct | enum_variant;
```

## Numbers
Expand All @@ -60,6 +60,8 @@ unsigned_octal = "0o", digit_octal, { digit_octal | "_" };
unsigned_hexadecimal = "0x", digit_hexadecimal, { digit_hexadecimal | "_" };
unsigned_decimal = digit, { digit | "_" };

byte = ascii | ("\\", (escape_ascii | escape_byte));

float = ["+" | "-"], ("inf" | "NaN" | float_num), [float_suffix];
float_num = (float_int | float_std | float_frac), [float_exp];
float_int = digit, { digit | "_" };
Expand All @@ -74,9 +76,13 @@ float_suffix = "f", ("32", "64");
```ebnf
string = string_std | string_raw;
string_std = "\"", { no_double_quotation_marks | string_escape }, "\"";
string_escape = "\\", ("\"" | "\\" | "b" | "f" | "n" | "r" | "t" | ("u", unicode_hex));
string_raw = "r" string_raw_content;
string_escape = "\\", (escape_ascii | escape_byte | escape_unicode);
string_raw = "r", string_raw_content;
string_raw_content = ("#", string_raw_content, "#") | "\"", { unicode_non_greedy }, "\"";

escape_ascii = "'" | "\"" | "\\" | "n" | "r" | "t" | "0";
escape_byte = "x", digit_hexadecimal, digit_hexadecimal;
escape_unicode = "u", digit_hexadecimal, [digit_hexadecimal, [digit_hexadecimal, [digit_hexadecimal, [digit_hexadecimal, [digit_hexadecimal]]]]];
```

> Note: Raw strings start with an `r`, followed by n `#`s and a quotation mark
Expand All @@ -93,6 +99,24 @@ Also see [the Rust document] about context-sensitivity of raw strings.

[the Rust document]: https://github.com/rust-lang/rust/blob/d046ffddc4bd50e04ffc3ff9f766e2ac71f74d50/src/grammar/raw-string-literal-ambiguity.md

## Byte String

```ebnf
byte_string = byte_string_std | byte_string_raw;
byte_string_std = "b\"", { no_double_quotation_marks | string_escape }, "\"";
byte_string_raw = "br", string_raw_content;
```

> Note: Byte strings are similar to normal strings but are not required to
contain only valid UTF-8 text. RON's byte strings follow the updated Rust
byte string literal rules as proposed in [RFC #3349], i.e. byte strings
allow the exact same characters and escape codes as normal strings.

[RFC #3349](https://github.com/rust-lang/rfcs/pull/3349)

> Note: Raw byte strings start with an `br` prefix and follow the same rules
as raw strings, which are outlined above.

## Char

```ebnf
Expand Down
146 changes: 146 additions & 0 deletions examples/base64.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
//! ron initially encoded byte-slices and byte-bufs as base64-encoded strings.
//! However, since v0.9, ron now uses Rusty byte string literals instead.
//!
//! This example shows how the previous behaviour can be restored by serialising
//! bytes with strongly-typed base64-encoded strings, or accepting both Rusty
//! byte strings and the legacy base64-encoded string syntax.

use base64::engine::{general_purpose::STANDARD as BASE64, Engine};
use serde::{de::Visitor, Deserialize, Deserializer, Serialize, Serializer};

#[derive(Debug, PartialEq, Serialize, Deserialize)]
struct Config {
#[serde(with = "ByteStr")]
bytes: Vec<u8>,
#[serde(with = "Base64")]
base64: Vec<u8>,
#[serde(with = "ByteStrOrBase64")]
bytes_or_base64: Vec<u8>,
}

enum ByteStr {}

impl ByteStr {
fn serialize<S: Serializer>(data: &[u8], serializer: S) -> Result<S::Ok, S::Error> {
serializer.serialize_bytes(data)
}

fn deserialize<'de, D: Deserializer<'de>>(deserializer: D) -> Result<Vec<u8>, D::Error> {
struct ByteStrVisitor;

impl<'de> Visitor<'de> for ByteStrVisitor {
type Value = Vec<u8>;

fn expecting(&self, fmt: &mut std::fmt::Formatter) -> std::fmt::Result {
fmt.write_str("a Rusty byte string")
}

fn visit_bytes<E: serde::de::Error>(self, bytes: &[u8]) -> Result<Self::Value, E> {
Ok(bytes.to_vec())
}

fn visit_byte_buf<E: serde::de::Error>(self, bytes: Vec<u8>) -> Result<Self::Value, E> {
Ok(bytes)
}
}

deserializer.deserialize_byte_buf(ByteStrVisitor)
}
}

enum Base64 {}

impl Base64 {
fn serialize<S: Serializer>(data: &[u8], serializer: S) -> Result<S::Ok, S::Error> {
serializer.serialize_str(&BASE64.encode(data))
}

fn deserialize<'de, D: Deserializer<'de>>(deserializer: D) -> Result<Vec<u8>, D::Error> {
let base64_str = <&str>::deserialize(deserializer)?;
BASE64.decode(base64_str).map_err(serde::de::Error::custom)
}
}

enum ByteStrOrBase64 {}

impl ByteStrOrBase64 {
fn serialize<S: Serializer>(data: &[u8], serializer: S) -> Result<S::Ok, S::Error> {
if cfg!(all()) {
// either of these would work
serializer.serialize_str(&BASE64.encode(data))
} else {
serializer.serialize_bytes(data)
}
}

fn deserialize<'de, D: Deserializer<'de>>(deserializer: D) -> Result<Vec<u8>, D::Error> {
struct ByteStrOrBase64Visitor;

impl<'de> Visitor<'de> for ByteStrOrBase64Visitor {
type Value = Vec<u8>;

fn expecting(&self, fmt: &mut std::fmt::Formatter) -> std::fmt::Result {
fmt.write_str("a Rusty byte string or a base64-encoded string")
}

fn visit_str<E: serde::de::Error>(self, base64_str: &str) -> Result<Self::Value, E> {
BASE64.decode(base64_str).map_err(serde::de::Error::custom)
}

fn visit_bytes<E: serde::de::Error>(self, bytes: &[u8]) -> Result<Self::Value, E> {
Ok(bytes.to_vec())
}

fn visit_byte_buf<E: serde::de::Error>(self, bytes: Vec<u8>) -> Result<Self::Value, E> {
Ok(bytes)
}
}

deserializer.deserialize_any(ByteStrOrBase64Visitor)
}
}

fn main() {
let ron = r#"Config(
bytes: b"only byte strings are allowed",
base64: "b25seSBiYXNlNjQtZW5jb2RlZCBzdHJpbmdzIGFyZSBhbGxvd2Vk",
bytes_or_base64: b"both byte strings and base64-encoded strings work",
)"#;

assert_eq!(
ron::from_str::<Config>(ron).unwrap(),
Config {
bytes: b"only byte strings are allowed".to_vec(),
base64: b"only base64-encoded strings are allowed".to_vec(),
bytes_or_base64: b"both byte strings and base64-encoded strings work".to_vec()
}
);

let ron = r#"Config(
bytes: b"only byte strings are allowed",
base64: "b25seSBiYXNlNjQtZW5jb2RlZCBzdHJpbmdzIGFyZSBhbGxvd2Vk",
bytes_or_base64: "Ym90aCBieXRlIHN0cmluZ3MgYW5kIGJhc2U2NC1lbmNvZGVkIHN0cmluZ3Mgd29yaw==",
)"#;

assert_eq!(
ron::from_str::<Config>(ron).unwrap(),
Config {
bytes: b"only byte strings are allowed".to_vec(),
base64: b"only base64-encoded strings are allowed".to_vec(),
bytes_or_base64: b"both byte strings and base64-encoded strings work".to_vec()
}
);

println!(
"{}",
ron::ser::to_string_pretty(
&Config {
bytes: b"only byte strings are allowed".to_vec(),
base64: b"only base64-encoded strings are allowed".to_vec(),
bytes_or_base64: b"both byte strings and base64-encoded strings work".to_vec()
},
ron::ser::PrettyConfig::default().struct_names(true)
)
.unwrap()
);
}
24 changes: 9 additions & 15 deletions src/de/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ use std::{
str,
};

use base64::Engine;
use serde::{
de::{self, DeserializeSeed, Deserializer as _, Visitor},
Deserialize,
Expand All @@ -17,7 +16,7 @@ use crate::{
error::{Result, SpannedResult},
extensions::Extensions,
options::Options,
parse::{Bytes, NewtypeMode, ParsedStr, StructType, TupleMode, BASE64_ENGINE},
parse::{Bytes, NewtypeMode, ParsedByteStr, ParsedStr, StructType, TupleMode},
};

mod id;
Expand Down Expand Up @@ -322,8 +321,12 @@ impl<'de, 'a> de::Deserializer<'de> for &'a mut Deserializer<'de> {
b'{' => self.deserialize_map(visitor),
b'0'..=b'9' | b'+' | b'-' | b'.' => self.bytes.any_number()?.visit(visitor),
b'"' | b'r' => self.deserialize_string(visitor),
b'b' if matches!(self.bytes.bytes().get(1), Some(b'\'')) => {
self.bytes.any_number()?.visit(visitor)
}
b'b' => self.deserialize_byte_buf(visitor),
b'\'' => self.deserialize_char(visitor),
other => Err(Error::UnexpectedByte(other as char)),
other => Err(Error::UnexpectedByte(other)),
}
}

Expand Down Expand Up @@ -460,18 +463,9 @@ impl<'de, 'a> de::Deserializer<'de> for &'a mut Deserializer<'de> {
return visitor.visit_byte_buf(bytes);
}

let res = {
let string = self.bytes.string()?;
let base64_str = match string {
ParsedStr::Allocated(ref s) => s.as_str(),
ParsedStr::Slice(s) => s,
};
BASE64_ENGINE.decode(base64_str)
};

match res {
Ok(byte_buf) => visitor.visit_byte_buf(byte_buf),
Err(err) => Err(Error::Base64Error(err)),
match self.bytes.byte_string()? {
ParsedByteStr::Allocated(byte_buf) => visitor.visit_byte_buf(byte_buf),
ParsedByteStr::Slice(bytes) => visitor.visit_borrowed_bytes(bytes),
}
}

Expand Down
2 changes: 1 addition & 1 deletion src/de/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -334,7 +334,7 @@ fn test_byte_stream() {
small: vec![1, 2],
large: vec![1, 2, 3, 4]
}),
from_str("BytesStruct( small:[1, 2], large:\"AQIDBA==\" )"),
from_str("BytesStruct( small:[1, 2], large:b\"\\x01\\x02\\x03\\x04\" )"),
);
}

Expand Down
2 changes: 1 addition & 1 deletion src/de/value.rs
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ impl<'de> Visitor<'de> for ValueVisitor {
where
E: Error,
{
self.visit_string(String::from_utf8(v).map_err(|e| Error::custom(format!("{}", e)))?)
Ok(Value::Bytes(v))
}

fn visit_none<E>(self) -> Result<Self::Value, E>
Expand Down
Loading
Loading