Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize AWSSDKUtils.ToHex() for speed and memory #3293

Merged
merged 7 commits into from
Jun 5, 2024

Conversation

stevenaw
Copy link
Contributor

@stevenaw stevenaw commented Apr 22, 2024

Description

Optimize uppercase and lowercase versions of ToHex() across all runtimes. Use the built-in option for uppercase on .NET8. Improved speed + memory, particularly for uppercase on .NET8. There's still more room for improvement for the other scenarios but I decided to keep it simple and follow the existing bit twiddling patterns I saw in UrlEncode in the file

Motivation and Context

Testing

Added unit tests for upper + lower case prior to making change. Ran them at the end to validate the new implementation passed. Ran benchmarks against netcoreapp3.1 and net8.0

Screenshots (if appropriate)

image
image

Benchmark Code
    [MemoryDiagnoser]
  public class ToHexString
  {
      // 11 = "Hello World" :)
      [Params(11, 128)]
      public int N { get; set; }

      // 11 = "Hello World" :)
      [Params(true, false)]
      public bool Lowercase { get; set; }

      public byte[] Payload { get; set; }

      [GlobalSetup]
      public void Setup()
      {
          Payload = Encoding.UTF8.GetBytes(new string('A', N));
      }

      [Benchmark(Baseline = true)]
      public string Original()
      {
          var data = Payload;
          var lowercase = Lowercase;

          StringBuilder sb = new StringBuilder();

          for (int i = 0; i < data.Length; i++)
          {
              sb.Append(data[i].ToString(lowercase ? "x2" : "X2", CultureInfo.InvariantCulture));
          }

          return sb.ToString();
      }

      [Benchmark]
      public string Optimized()
      {
          var data = Payload;
          var lowercase = Lowercase;

#if NET8_0_OR_GREATER
          if (!lowercase)
          {
              return Convert.ToHexString(data);
          }
#endif

          char[] chars = ArrayPool<char>.Shared.Rent(data.Length * 2);

          try
          {
              Func<int, char> converter = lowercase ? (Func<int, char>)ToLowerHex : (Func<int, char>)ToUpperHex;

              for (int i = 0; i < data.Length; i++)
              {
                  // Break apart the byte into two four-bit components and
                  // then convert each into their hexadecimal equivalent.
                  byte b = data[i];
                  int hiNibble = b >> 4;
                  int loNibble = b & 0xF;

                  chars[i * 2] = converter(hiNibble);
                  chars[i * 2 + 1] = converter(loNibble);
              }

              return new string(chars, 0, data.Length * 2);
          }
          finally
          {
              ArrayPool<char>.Shared.Return(chars);
          }
      }
      private static char ToUpperHex(int value)
      {
          // Maps 0-9 to the Unicode range of '0' - '9' (0x30 - 0x39).
          if (value <= 9)
          {
              return (char)(value + '0');
          }
          // Maps 10-15 to the Unicode range of 'A' - 'F' (0x41 - 0x46).
          return (char)(value - 10 + 'A');
      }

      private static char ToLowerHex(int value)
      {
          // Maps 0-9 to the Unicode range of '0' - '9' (0x30 - 0x39).
          if (value <= 9)
          {
              return (char)(value + '0');
          }
          // Maps 10-15 to the Unicode range of 'A' - 'F' (0x41 - 0x46).
          return (char)(value - 10 + 'a');
      }
  }

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project
  • My change requires a change to the documentation
  • I have updated the documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed

License

  • I confirm that this pull request can be released under the Apache 2 license

@dscpinheiro dscpinheiro added the v4 label Apr 22, 2024
@ashovlin ashovlin requested a review from normj April 23, 2024 17:33
@danielmarbach
Copy link
Contributor

danielmarbach commented Apr 28, 2024

FYI I also added a similar unit test in this draft PR that optimizes UrlEncode that you mention here #3307

There I also removed Netstandard from the unit test since that is not a valid target for unit tests.

Copy link
Contributor

@danielmarbach danielmarbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just stumbled into this because I was tweaking UrlEncode. Great stuff 😍

@@ -39,6 +39,7 @@
#if NETSTANDARD
using System.Net.Http;
using System.Runtime.InteropServices;
using System.Buffers;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably move outside the #if

{
Func<int, char> converter = lowercase ? (Func<int, char>)ToLowerHex : (Func<int, char>)ToUpperHex;

for (int i = 0; i < data.Length; i++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity have you tried reversing the loop to elide bound checks?

Also could this be a candidate for string.Create since that would take care of the pooling already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions @danielmarbach ! Both were very helpful.

#if NETCOREAPP3_1_OR_GREATER
Func<int, char> converter = lowercase ? (Func<int, char>)ToLowerHex : (Func<int, char>)ToUpperHex;

return string.Create(data.Length * 2, (data, converter), (chars, state) =>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note: I had to keep this lambda non-static in order to be compatible with <LangVersion>8</LangVersion> in the csproj.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be changed in my opinion. A reasonable version should at least be 9 even when targeting Netstandard 2.0. That is a good language subset without exposing yourself to weird edge cases. That's how also the Azure .NET SDK treated it until they changed it to an even later version as far as I recall.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #3316

byte[] data = state.data;
Func<int, char> converter = state.converter;

for (int i = data.Length - 1; i >= 0; i--)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note: I found two copies of the loops here were required to get the best performance as they each operate on different types. Trying to funnel both paths through a Span<byte> to consolidate the code ended up ~10% slower when consumed by .NET Framework.

I'm not sure if any reviewers knew of tricks to get around this without hitting "slow span" paths there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience is that even if you end up with slow downs in the microbenchmarks without a reasonable range due to the massive allocation reductions the overall beneficial effects of getting rid of the allocations especially in a real system using some form of concurrency will outweigh the small latency hits. It is not worth the complexity of maintaining different paths for accommodate for the slow span path. This also takes into account that more modern .NET versions are becoming more and more mainstream and people that are on .NET Framework know what they are not getting.

Copy link
Contributor Author

@stevenaw stevenaw May 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt feedback and advice here @danielmarbach . That makes perfect sense to me.
I've pushed another changeset. My laptop is on battery at the moment so benchmarking will be less reliable, but I can try to run and share them later tonight.

EDIT: It seems the csproj changes mean I now have conflicts. Looks like a rebase is ahead.

@stevenaw
Copy link
Contributor Author

stevenaw commented May 7, 2024

I've pushed a new commit to incorporate the first review feedback. I've rerun the benchmarks, now slightly modified to focus on 128 length. Eliding the bounds check seems to have saved around 20ns.
image

Copy link
Member

@normj normj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delayed response to the PR but impressive performance numbers! I had just a couple comments and discussing with others on the team about the LangVersion change.

namespace UnitTests.NetStandard.Core
{
[Trait("Category", "Core")]
public class AWSSDKUtilsTests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the #else block in the ToHex method is really just for .NET Framework can you copy the tests to the .NET Framework unit tests in the sdk\test\UnitTests\Custom folder. Hopefully as part of V4 we can do some test restructure cleanup so we don't have to copy the test between the .NET Framework and .NET Standard+.

hex = hex.Replace("-", string.Empty);
return hex;
}
public static string BytesToHexString(byte[] value) => ToHex(value, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why we had both ToHex and BytesToHexString. Going with old codebase. The BytesToHexString is only called in AmazonS3ResponseHandler.cs. Since this is v4 I think we can just get rid of this method and update AmazonS3ReponseHandler.

@stevenaw
Copy link
Contributor Author

Thanks for the initial feedback @normj . I've found an existing "sdk/test/UnitTests/Custom/Util/AWSSDKUtilsTests.cs" file to add the netfx-side tests to rather than creating a new file but please let me know if you'd instead like it outside of that "Util" folder instead. I look forward to hearing your decision about the <LangVersion>.

As an aside:
In case anyone else were to hit this issue and search for it, I had some issues building AWSSDK.Extensions.CrtIntegration and some downstream dependants due to a path length issue on Windows. This was causing the file AWSSDK.Extensions.CrtIntegration.Netstandard.GeneratedMSBuildEditorConfig.editorconfig to not be found. This was resolved by enabling long paths in Windows: https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry#enable-long-paths-in-windows-10-version-1607-and-later

@normj
Copy link
Member

normj commented May 28, 2024

@stevenaw That is fine putting the tests in AWSSDKUtilsTests. Lets go ahead with the change with LangVersion to 9.0.

@stevenaw
Copy link
Contributor Author

Thanks for the confirmation @normj and great to hear about LangVersion too. I think all the requested changes are now done on my end unless you wanted to have all the project files' LangVersion updated in this PR. Presently it's just AWSSDK.Core.NetStandard.csproj and its unit tests which target the newer version.

@normj
Copy link
Member

normj commented May 30, 2024

@stevenaw Lets not expand your PR with generator changes for LangVersion. The change overall looks good to me. As soon as I can I'll do a more thorough review.

@normj normj requested a review from boblodgett June 5, 2024 00:57
@normj
Copy link
Member

normj commented Jun 5, 2024

PR looks good. Getting another team member to do another pass before merging.

@normj normj merged commit 59f087d into aws:v4-development Jun 5, 2024
@normj
Copy link
Member

normj commented Jun 5, 2024

Thanks for the perf PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants