Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add System.Text fuzzing for encoders #103968

Merged
merged 6 commits into from
Jun 27, 2024

Conversation

steveharter
Copy link
Member

This fuzzes all standard encoders including: Latin1 (ISO-8859-1), ASCIIEncoding, UnicodeEncoding, Utf32Encoding, Utf7Encoding, and Utf8Encoding.

Notes:

  • Substitutions are fuzzed. The substitution approach varies with each encoder, and means that unknown input is either replaced with well-known characters, or in the case of Utf7Encoding, unknown and non-ASCII characters are encoded as Base64.
  • Non-substitution is fuzzed for the cases that support it. This throws DecoderFallbackException when invalid input is received.
  • Decoder.Convert() is fuzzed for the cases that support it. This verifies working over a large input in a streaming-like manner.
  • .NET Framework as also fuzzed. See the comments in the .cs file.

The fuzz tests were run for ~24 hours locally without any failures. This included .NET Framework. FWIW the last few lines from the tests:

#7813388        REDUCE cov: 5 ft: 11386 corp: 1927/334Kb lim: 4096 exec/s: 97 rss: 34Mb L: 200/4052 MS: 4 ChangeBinInt-ChangeASCIIInt-InsertByte-EraseBytes-
#7819019        REDUCE cov: 5 ft: 11386 corp: 1927/334Kb lim: 4096 exec/s: 97 rss: 34Mb L: 69/4052 MS: 1 EraseBytes-
#7827657        REDUCE cov: 5 ft: 11386 corp: 1927/334Kb lim: 4096 exec/s: 97 rss: 34Mb L: 509/4052 MS: 3 InsertByte-InsertByte-EraseBytes-
#7831994        REDUCE cov: 5 ft: 11386 corp: 1927/334Kb lim: 4096 exec/s: 97 rss: 34Mb L: 622/4052 MS: 2 InsertByte-EraseBytes-
#7832339        REDUCE cov: 5 ft: 11386 corp: 1927/334Kb lim: 4096 exec/s: 97 rss: 34Mb L: 94/4052 MS: 5 ShuffleBytes-ChangeByte-InsertByte-ShuffleBytes-EraseBytes-
#7833160        REDUCE cov: 5 ft: 11386 corp: 1927/334Kb lim: 4096 exec/s: 97 rss: 34Mb L: 16/4052 MS: 1 EraseBytes-
#7836155        REDUCE cov: 5 ft: 11386 corp: 1927/334Kb lim: 4096 exec/s: 97 rss: 34Mb L: 217/4052 MS: 5 CopyPart-ChangeASCIIInt-EraseBytes-CopyPart-EraseBytes-
#7841859        REDUCE cov: 5 ft: 11386 corp: 1927/334Kb lim: 4096 exec/s: 97 rss: 34Mb L: 974/4052 MS: 4 EraseBytes-CrossOver-ShuffleBytes-ChangeBinInt-

Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

1 similar comment
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

// However, this test class, while running under .NET Core, was used to foward the fuzzing
// input to a .NET Framework console app. That app had the same test semantics as the tests
// here, although used slightly different supporting APIs since not all supporting library
// and language features are present in .NET Framework.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this is working? I am not seeing any forwarding to console app here? is it done somewhere else when you do #define FORWARD_TO_NETFRAMEWORK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code to do this was removed from this PR for a few reasons, but exists in the commit history. See 94010d8#diff-1249360cc8e4b2057cb1111a705788d87911917d770743eae1779847159f9790R66-R67

Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Wondering if we are doing the fuzzing only on the limited set of encodings? or we are gong to extend that to other SBCP/DBCP encodings?

@steveharter
Copy link
Member Author

Wondering if we are doing the fuzzing only on the limited set of encodings? or we are gong to extend that to other SBCP/DBCP encodings?

The original ask for fuzzing here was to just cover the basic encodings here. We could of course expand on this pending feedback.

@steveharter steveharter merged commit 1bce1f7 into dotnet:main Jun 27, 2024
74 of 83 checks passed
@steveharter steveharter deleted the SystemTextFuzzing branch June 27, 2024 14:00
@github-actions github-actions bot locked and limited conversation to collaborators Jul 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants