Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a custom Serializer/Deserializer to fix Nan and Infinity float #888

Closed
wants to merge 2 commits into from

Conversation

JeanArhancet
Copy link
Contributor

@JeanArhancet JeanArhancet commented Aug 16, 2023

Change Summary

Add a custom Serializer to fix Nan and Infinity float

Related issue number

closes #872

Checklist

  • Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • Pydantic tests pass with this pydantic-core (except for expected changes)
  • My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

@JeanArhancet JeanArhancet force-pushed the fix/nan-inf-float branch 3 times, most recently from e306f56 to 47e9a61 Compare August 16, 2023 08:14
@codecov
Copy link

codecov bot commented Aug 16, 2023

Codecov Report

Merging #888 (6b0063c) into main (2e59f25) will decrease coverage by 9.73%.
Report is 28 commits behind head on main.
The diff coverage is 29.85%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #888      +/-   ##
==========================================
- Coverage   93.79%   84.06%   -9.73%     
==========================================
  Files         105      110       +5     
  Lines       15252    18052    +2800     
  Branches       25       25              
==========================================
+ Hits        14305    15176     +871     
- Misses        941     2870    +1929     
  Partials        6        6              
Files Changed Coverage Δ
src/input/parse_json.rs 97.54% <ø> (+1.20%) ⬆️
src/lib.rs 100.00% <ø> (ø)
src/serializers/type_serializers/simple.rs 96.15% <ø> (ø)
src/input/shared.rs 84.76% <7.14%> (-11.98%) ⬇️
src/serde/ser.rs 24.85% <24.85%> (ø)
src/serde/de.rs 26.53% <26.53%> (ø)
src/serde/error.rs 28.22% <28.22%> (ø)
src/serde/read.rs 40.51% <40.51%> (ø)
src/serializers/type_serializers/float.rs 98.14% <98.14%> (ø)
src/errors/validation_exception.rs 93.83% <100.00%> (ø)
... and 4 more

... and 7 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e59f25...6b0063c. Read the comment docs.

@codspeed-hq
Copy link

codspeed-hq bot commented Aug 16, 2023

CodSpeed Performance Report

Merging #888 will degrade performances by 13.48%

Comparing JeanArhancet:fix/nan-inf-float (6b0063c) with main (a4015ab)

Summary

❌ 1 regressions
✅ 137 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main JeanArhancet:fix/nan-inf-float Change
test_validate_literal[python-few_str_enum] 27.7 µs 32 µs -13.48%

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is looking pretty good to me, just a thought on high-level design at this point.

Also, I wonder whether we want to merge this before we can support deserializing Infinity and NaN? It would break round-trips.

Comment on lines +84 to +86
if (v.is_nan() || v.is_infinite()) && !self.allow_inf_nan {
return serializer.serialize_none();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going back and forth whether allow_inf_nan is the right config option here. I wonder if allow_inf_nan == false should actually lead to an error when serializing, like Python's json.dump will error with allow_nan=False.

Maybe instead we need float_inf_nan_mode similar to timedelta_mode and bytes_mode, with two string options? ("constants" or "null"?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the difference? "constants" triggers an error?

@JeanArhancet
Copy link
Contributor Author

Maybe I can create a new folder in the src directory (serde for the name?) to group ser.rs and de.rs and errors. WDYT?

@davidhewitt
Copy link
Contributor

Can you give me a sense of which code you're wanting to move in there?

@JeanArhancet
Copy link
Contributor Author

JeanArhancet commented Aug 17, 2023

Can you give me a sense of which code you're wanting to move in there?

I was thinking of creating a module in src called serde to group together the PythonSerializer and PythonDeserializer:

  • serde/
    - ser.rs
    - de.rs
    - read.rs
    - mod.rs
    - read.rs

Adapted from the ser.rs, de.rs and read.rs of https://github.com/serde-rs/json/tree/master/src WDYT? @davidhewitt

@davidhewitt
Copy link
Contributor

Seems reasonable to me, want to go ahead and add that as a commit on top so that we can always back out if it doesn't sit well?

@JeanArhancet
Copy link
Contributor Author

JeanArhancet commented Aug 18, 2023

Seems reasonable to me, want to go ahead and add that as a commit on top so that we can always back out if it doesn't sit well?

Done in this commit 6b0063c, that's a lot of code...
I've added support for NaN, -Infinity and Infinity in the Deserialiser

@JeanArhancet JeanArhancet changed the title Add a custom Serializer to fix Nan and Infinity float Add a custom Serializer/Deserializer to fix Nan and Infinity float Aug 18, 2023
@davidhewitt
Copy link
Contributor

Crikey, that's a lot of code. If I understand correctly, it's a copy-paste from serde_json and then modified to support the additional functionality we want?

I think probably this is necessary, though I wonder if we can simplify it at all. I'll try and read through it 🤔

@JeanArhancet
Copy link
Contributor Author

JeanArhancet commented Aug 28, 2023

Crikey, that's a lot of code. If I understand correctly, it's a copy-paste from serde_json and then modified to support the additional functionality we want?

I think probably this is necessary, though I wonder if we can simplify it at all. I'll try and read through it 🤔

I looked at this repository(https://github.com/getsentry/rust-json-forensics/) that adds a transformation to serde_json, which could be an interesting approach. WDYT @davidhewitt ?

@JeanArhancet
Copy link
Contributor Author

@davidhewitt Sorry to ping you, but what should I do about this MR?
I've seen the MR on Jitter, can I go back to just the Serializer implementation? WDYT?

@davidhewitt
Copy link
Contributor

Yes sorry for the long time of inactivity here from me. A first implementation of jiter seems likely to make it in within a few weeks. I think that going back to the Serializer implementation only makes sense, it was also a lot less code.

I'm going back and forth whether allow_inf_nan is the right config option here. I wonder if allow_inf_nan == false should actually lead to an error when serializing, like Python's json.dump will error with allow_nan=False.

Maybe instead we need float_inf_nan_mode similar to timedelta_mode and bytes_mode, with two string options? ("constants" or "null"?)

What would be the difference? "constants" triggers an error?

I think we want the following options:

  • allow_inf_nan: bool
    • True - serializer will accept these values. What it outputs depends on ser_json_inf_nan
    • False - serializer will error on encountering inf or NaN.
  • ser_json_inf_nan: Literal['constants', 'null']
    • 'constants' - emits Infinity, -Infinity and NaN
    • 'null' - emits null

I suppose the default of ser_json_inf_nan will need to be null for compatibility with Pydantic v2 but we should consider changing it to constants in v3 as I think that's what most users will expect.

@davidhewitt davidhewitt mentioned this pull request Nov 6, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix the Float Serializer with Infinite and NaN Number Values
2 participants