Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Xxhash128 from sha256 #70723

Merged
merged 11 commits into from
Nov 16, 2023
Merged

Conversation

CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Nov 7, 2023

Fixes AB#1880936

xxhash128 is smaller, and much faster (around 40x so) while still retaining the collision resistance we need for non-cryptographic content hashing. The API is also much easier to use, resulting in a ton of complex netfx vs netcore code being deleted.

@CyrusNajmabadi CyrusNajmabadi requested review from a team as code owners November 7, 2023 22:42
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Nov 7, 2023
#pragma warning disable CS9191 // The 'ref' modifier for an argument corresponding to 'in' parameter is equivalent to 'in'. Consider using 'in' instead.
Contract.ThrowIfFalse(MemoryMarshal.TryWrite(span, ref Unsafe.AsRef(in this)));
#pragma warning restore CS9191
}

public static unsafe HashData FromPointer(HashData* hash)
=> new(hash->Data1, hash->Data2, hash->Data3);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used.

@@ -21,135 +19,41 @@ namespace Microsoft.CodeAnalysis
internal partial record class Checksum
{
// https://github.com/dotnet/runtime/blob/f2db6d6093c54e5eeb9db2d8dcbe15b2db92ad8c/src/libraries/System.Security.Cryptography.Algorithms/src/System/Security/Cryptography/SHA256.cs#L18-L19
private const int SHA256HashSizeBytes = 256 / 8;
private const int XXHash128SizeBytes = 128 / 8;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why we need this constant and the HashSize constant. But was keeping this roughly the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you follow up with cleanup-refactoring?
Other than having this duplicated I'm not sure about the value of having the Create methods in a separate file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. i'm also intending to measure why we need Checksum and HashData when it seems like we could just have the latter. The former is technically good if we expect a few actual distinct checksums, and tons of pointers to them. The latter would be good if we expect less pointers and most of the values to just be needed in the place where we're pointing at them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have been a 32-bit process the last time this was evaluated, but the concern last time we looked was the number of pointers could exceed the number of instances.


// Note: number of elements here should be computed based on what we need from our various collection-with-children objects.
private static readonly ObjectPool<byte[]>[] s_checksumByteArrayPool =
Enumerable.Range(0, 11).Select(i => new ObjectPool<byte[]>(() => new byte[HashSize * i])).ToArray();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no longer needed. the xxhash128 has all the functionality we need.

using var pooledHash = s_incrementalHashPool.GetPooledObject();
Span<byte> buffer = stackalloc byte[SharedPools.ByteBufferSize];
pooledHash.Object.Append(stream);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xxhash128 already has support for appending a stream to it. woohoo.

@@ -19,7 +19,7 @@ public void ValidateChecksumFromSpanSameAsChecksumFromBytes1()
var checksumA = Checksum.Create(checksum1, checksum2);

// Running this test on multiple target frameworks with the same expectation ensures the results match
Assert.Equal(Checksum.FromBase64String("N30m5jwVeMZzWpy9cbQbtSYHoXU="), checksumA);
Assert.Equal(Checksum.FromBase64String("RRbspG+E4ziBC5hOWyrfCQ=="), checksumA);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected change. we are legitimately using a different hash.

the tests do check though that we get the same result on netfx vs netcore (that way we can have roslyn communicating across process while still agreeing on the results).

@sharwell
Copy link
Member

sharwell commented Nov 8, 2023

@DustinCampbell We should probably switch Razor at the same time if we are making this change

@@ -119,29 +120,24 @@ public override int GetHashCode()
[DataContract, StructLayout(LayoutKind.Explicit, Size = HashSize)]
public readonly record struct HashData(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashData

Can we use InlineArray feature instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. this has to run on netfx and core.

@@ -245,6 +245,7 @@
<SystemDrawingCommonVersion>7.0.0</SystemDrawingCommonVersion>
<SystemIOFileSystemVersion>4.3.0</SystemIOFileSystemVersion>
<SystemIOFileSystemPrimitivesVersion>4.3.0</SystemIOFileSystemPrimitivesVersion>
<SystemIOHashingVersion>8.0.0-rc.2.23479.6</SystemIOHashingVersion>
Copy link
Contributor

@AArnott AArnott Nov 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably won't want to ship a prerelease of this dependency with VS. Can you use the latest stable version instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot. the latest stable does not include xxhash128 (the actual hashing function we're trying to use).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. In that case, we'll just ask that you help maintain the version on the VS repo side to switch to the stable 8.0 version when it's available.

@CyrusNajmabadi
Copy link
Member Author

Going to hold off on this till next tuesday when 8.0 releases.

eng/Versions.props Outdated Show resolved Hide resolved
@CyrusNajmabadi
Copy link
Member Author

CyrusNajmabadi commented Nov 14, 2023

@ToddGrun @jasonmalinowski ptal. Source-Build update is here: dotnet/source-build-reference-packages#822

@CyrusNajmabadi
Copy link
Member Author

Note: this will need changes to VS build to pull in System.IO.Hashing. @jasonmalinowski are you a good person to coordinate that with?

@CyrusNajmabadi
Copy link
Member Author

@jaredpar @jasonmalinowski the source-build change went in with: dotnet/source-build-reference-packages#822

Do you know why this is still having issues building on our side? Thanks! :)

@lbussell
Copy link
Contributor

You may need to pick up the latest version of SBRP - similar to #69901.

@MichaelSimons
Copy link
Member

You may need to pick up the latest version of SBRP - similar to #69901.

Ideally long term a darc subscription would be setup to automatically flow new versions in.

@CyrusNajmabadi CyrusNajmabadi merged commit ff7bdfa into dotnet:main Nov 16, 2023
20 of 27 checks passed
@ghost ghost added this to the Next milestone Nov 16, 2023
@CyrusNajmabadi CyrusNajmabadi deleted the xxhash128 branch November 16, 2023 19:33
@RikkiGibson RikkiGibson modified the milestones: Next, 17.9 P2 Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants