Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserializing strings in place. #79

Closed
wants to merge 0 commits into from

Conversation

sammhicks
Copy link
Contributor

De-escaping JSON strings will always produce a shorter (or equal length) string, so it's safe to de-escape strings in place, thus allowing types to borrow plaintext (not escaped) strings from the buffer after deserialization.

This is a semver breaking change as it requires the JSON input to be passed in mutably to allow for the de-escaping in place.
This is safe because once the serialization has passed the string, it never reads that part of the buffer again.

@sammhicks sammhicks force-pushed the master branch 2 times, most recently from 9a06f6b to 1cafd85 Compare January 1, 2024 15:49
@sammhicks
Copy link
Contributor Author

Not to nag, but I've now fixed the tests, they run successfully on my fork, and thus it should now be ready to test and merge.
Sorry for the repeated sync, and absolutely no rush :)

Copy link
Member

@ryan-summers ryan-summers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this notification got buried in my Github notifications so I didn't see it until now.

In general, I'm not currently inclined to support merging the changes in this PR as implemented for the following reasons:

  1. Changing the API can have a very large ripple for downstream users, where this dependency may be injected in the middle of their dependency stacks. I recognize this would be a semver breaking change so it wouldn't break peoples code, but I suspect adoption would be low of the newer version because users may not have control of the mutability of their input data. (I know I have a number of dependencies like this personally)
  2. I have concerns about maintability and reliability of allowing the deserialization input to be mutated. We open ourselves up to a whole class of defects that werent previously possible by mutating the data that we're processing. For example, we could introduce bugs where the first serialization works, but the second doesn't. Or, we could cause the first deserialization result to differ from a second pass.

I don't believe that the benefit of supporting string escapes outweighs the downsides above.


That being said, I'd be more than happy to review a change where the input was copied into some buffer and then deserialized (i.e. the input is not mutable, but we do induce a copy of the data), but the user should have to specifically opt-in to this mechanism.

@sammhicks
Copy link
Contributor Author

sammhicks commented Apr 13, 2024

How about the following design?:

  • The deserializer takes a shared &str, as per the design before this pull request.
  • When deserializing JSON strings, the deserializer scans for escape sequences
    • If there are no escape sequences, it calls visitor.visit_borrowed_str(v), which will allow zero-copy deserialization
    • If there are escape sequences, it decodes the escape sequence using a provided buffer, and calls visitor.visit_str(v)
  • serde_json_core has a EscapedString newtype struct which contains an escaped &str, with utility methods to iterator over it, where the iterator returns either a &str of characters with no escape sequences, or an unescaped char
    • EscapedString is the only structure that is allowed to borrow escaped string data, and uses a special constant to signal to the deserializer that it's special.

I believe that this design will also solve #74

@ryan-summers
Copy link
Member

ryan-summers commented Apr 13, 2024

That definitely sounds like a nice approach to me! It may be useful to try and look at some cases where string escaping is useful to see if this design is onerous for the end user. Do you have a sample use case in mind? That may be helpful in figuring out if this is a good approach.

@sammhicks
Copy link
Contributor Author

I've written a bare-metal HTTP server framework, and would like it to be able to deserialize JSON encoded POST bodies into arbitary data structures, and on soundness grounds and to have separation of concerns, would like all decoding and unescaping to happen before the data is passed to the request handler.

@ryan-summers
Copy link
Member

Is this open-sourced and/or something I could look at to get an idea of how you're interested in using it? What I'm really trying to figure out is what this API would look like in a real-world example with actual code

@sammhicks
Copy link
Contributor Author

In case the message got lost in the post (hurrah for email), I've closed this pull request and opened a new one at #83 with my new design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants