Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ConcurrentDictionary performance, in particular for strings #81557

Merged
merged 6 commits into from
Feb 9, 2023

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Feb 2, 2023

  • By default, string hash codes are randomized. This is an important defense-in-depth security measure, but it also adds some overhead when strings are used as keys in dictionaries. Dictionary<> addresses that overhead by starting out with using non-randomized comparers, and then upgrades to randomized comparers only once enough collisions have been detected. This PR updates ConcurrentDictionary<> with similar tricks. The comparer is moved from being stored on the dictionary itself to instead be stored on the Tables object that's atomically swapped when the table grows; that way, the comparer always remains in sync with the hashcodes stored in the nodes in that table. When we enumerate a bucket looking for an existing key as part of an add, we count how many items we traverse, and if that resulting number is larger than the allowed threshold and we're currently using a non-randomized comparer, we force a rehash; that rehash will replace the non-randomized comparer with the equivalent randomized one.
  • The ConcurrentDictionary<> ctor is improved to presize based on the size of a collection being passed in; otherwise, it might resize multiple times as it's populating the dictionary. The sizing logic is also changed to use the same prime bucket selection size as does Dictionary<>.
  • The method we were using to compute the bucket for a key wasn't being inlined for reference type keys due to the generic context; that method has been moved to the outer type as a static to avoid the non-inlined call and extra generic dictionary lookup.
  • For all key types, we were also paying for a non-inlined ldelema helper call when reading the head node of a bucket; that's been addressed via a struct wrapper with a volatile node field, rather than using Volatile.Read to access the array element.
  • We were inconsistent in whether checked math was used in computing the size of the table. In some situations it would be possible to overflow without it being detected, or for it to be detected and manifest in various ways. This simplifies to just always use checked for computing the counts.
  • Remove unnecessary try/finally blocks that are leftover from CERs and thread abort protection.
  • Deduped some code with calls to helper functions.
Type Method Toolchain CustomComparer IgnoreCase Mean Error StdDev Ratio
GuidKeys TryGetValue \main ? ? 778.9 ns 13.28 ns 12.42 ns 1.00
GuidKeys TryGetValue \pr ? ? 565.5 ns 8.36 ns 7.82 ns 0.73
ObjectKeys TryGetValue \main False ? 1,139.3 ns 19.87 ns 17.61 ns 1.00
ObjectKeys TryGetValue \pr False ? 1,103.1 ns 11.62 ns 10.87 ns 0.97
ObjectKeys TryGetValue \main True ? 1,122.4 ns 9.74 ns 9.11 ns 1.00
ObjectKeys TryGetValue \pr True ? 1,050.3 ns 11.08 ns 10.36 ns 0.94
StringKeys TryGetValue \main ? True 2,331.2 ns 25.81 ns 22.88 ns 1.00
StringKeys TryGetValue \pr ? True 1,359.4 ns 24.38 ns 22.81 ns 0.58
StringKeys TryGetValue \main ? False 2,616.4 ns 41.60 ns 36.88 ns 1.00
StringKeys TryGetValue \pr ? False 1,826.4 ns 18.59 ns 15.52 ns 0.70
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;

[MemoryDiagnoser(false)]
public partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
}

[MemoryDiagnoser(false)]
public class StringKeys
{
    private KeyValuePair<string, string>[] _pairs;
    private ConcurrentDictionary<string, string> _cd;

    [Params(false, true)]
    public bool IgnoreCase { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _pairs =
            // from https://github.com/dotnet/runtime/blob/a30de6d40f69ef612b514344a5ec83fffd10b957/src/libraries/System.Formats.Asn1/src/System/Formats/Asn1/WellKnownOids.cs#L317-L419
            new[]
            {
                "1.2.840.10040.4.1", "1.2.840.10040.4.3", "1.2.840.10045.2.1", "1.2.840.10045.1.1", "1.2.840.10045.1.2", "1.2.840.10045.3.1.7", "1.2.840.10045.4.1", "1.2.840.10045.4.3.2", "1.2.840.10045.4.3.3", "1.2.840.10045.4.3.4",
                "1.2.840.113549.1.1.1", "1.2.840.113549.1.1.5", "1.2.840.113549.1.1.7", "1.2.840.113549.1.1.8", "1.2.840.113549.1.1.9", "1.2.840.113549.1.1.10", "1.2.840.113549.1.1.11", "1.2.840.113549.1.1.12", "1.2.840.113549.1.1.13",
                "1.2.840.113549.1.5.3", "1.2.840.113549.1.5.10", "1.2.840.113549.1.5.11", "1.2.840.113549.1.5.12", "1.2.840.113549.1.5.13", "1.2.840.113549.1.7.1", "1.2.840.113549.1.7.2", "1.2.840.113549.1.7.3", "1.2.840.113549.1.7.6",
                "1.2.840.113549.1.9.1", "1.2.840.113549.1.9.3", "1.2.840.113549.1.9.4", "1.2.840.113549.1.9.5", "1.2.840.113549.1.9.6", "1.2.840.113549.1.9.7", "1.2.840.113549.1.9.14", "1.2.840.113549.1.9.15", "1.2.840.113549.1.9.16.1.4",
                "1.2.840.113549.1.9.16.2.12", "1.2.840.113549.1.9.16.2.14", "1.2.840.113549.1.9.16.2.47", "1.2.840.113549.1.9.20", "1.2.840.113549.1.9.21", "1.2.840.113549.1.9.22.1", "1.2.840.113549.1.12.1.3", "1.2.840.113549.1.12.1.5",
                "1.2.840.113549.1.12.1.6", "1.2.840.113549.1.12.10.1.1", "1.2.840.113549.1.12.10.1.2", "1.2.840.113549.1.12.10.1.3", "1.2.840.113549.1.12.10.1.5", "1.2.840.113549.1.12.10.1.6", "1.2.840.113549.2.5", "1.2.840.113549.2.7",
                "1.2.840.113549.2.9", "1.2.840.113549.2.10", "1.2.840.113549.2.11", "1.2.840.113549.3.2", "1.2.840.113549.3.7", "1.3.6.1.4.1.311.17.1", "1.3.6.1.4.1.311.17.3.20", "1.3.6.1.4.1.311.20.2.3", "1.3.6.1.4.1.311.88.2.1",
                "1.3.6.1.4.1.311.88.2.2", "1.3.6.1.5.5.7.3.1", "1.3.6.1.5.5.7.3.2", "1.3.6.1.5.5.7.3.3", "1.3.6.1.5.5.7.3.4", "1.3.6.1.5.5.7.3.8", "1.3.6.1.5.5.7.3.9", "1.3.6.1.5.5.7.6.2", "1.3.6.1.5.5.7.48.1", "1.3.6.1.5.5.7.48.1.2",
                "1.3.6.1.5.5.7.48.2", "1.3.14.3.2.26", "1.3.14.3.2.7", "1.3.132.0.34", "1.3.132.0.35", "2.5.4.3", "2.5.4.5", "2.5.4.6", "2.5.4.7", "2.5.4.8", "2.5.4.10", "2.5.4.11", "2.5.4.97", "2.5.29.14", "2.5.29.15", "2.5.29.17", "2.5.29.19",
                "2.5.29.20", "2.5.29.35", "2.16.840.1.101.3.4.1.2", "2.16.840.1.101.3.4.1.22", "2.16.840.1.101.3.4.1.42", "2.16.840.1.101.3.4.2.1", "2.16.840.1.101.3.4.2.2", "2.16.840.1.101.3.4.2.3", "2.23.140.1.2.1", "2.23.140.1.2.2",
            }.Select(s => new KeyValuePair<string, string>(s, s)).ToArray();
        _cd = new ConcurrentDictionary<string, string>(_pairs, IgnoreCase ? StringComparer.Ordinal : StringComparer.OrdinalIgnoreCase);
    }

    [Benchmark]
    public int TryGetValue()
    {
        int count = 0;
        foreach (KeyValuePair<string, string> pair in _pairs)
            if (_cd.TryGetValue(pair.Key, out _)) 
                count++;
        return count;
    }
}

[MemoryDiagnoser(false)]
public class GuidKeys
{
    private KeyValuePair<Guid, int>[] _pairs;
    private ConcurrentDictionary<Guid, int> _cd;

    [GlobalSetup]
    public void Setup()
    {
        _pairs = Enumerable.Range(0, 99).Select(i => new KeyValuePair<Guid, int>(Guid.NewGuid(), i)).ToArray();
        _cd = new ConcurrentDictionary<Guid, int>(_pairs);
    }

    [Benchmark]
    public int TryGetValue()
    {
        int count = 0;
        foreach (KeyValuePair<Guid, int> pair in _pairs)
            if (_cd.TryGetValue(pair.Key, out _)) 
                count++;
        return count;
    }
}

[DisassemblyDiagnoser(maxDepth: 2)]
[MemoryDiagnoser(false)]
public class ObjectKeys
{
    private KeyValuePair<MyObject, int>[] _pairs;
    private ConcurrentDictionary<MyObject, int> _cd;

    [Params(false, true)]
    public bool CustomComparer { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _pairs = Enumerable.Range(0, 99).Select(i => new KeyValuePair<MyObject, int>(new MyObject(), i)).ToArray();
        _cd = new ConcurrentDictionary<MyObject, int>(_pairs, CustomComparer ? ReferenceEqualityComparer.Instance : EqualityComparer<MyObject>.Default);
    }

    [Benchmark]
    public int TryGetValue()
    {
        int count = 0;
        foreach (KeyValuePair<MyObject, int> pair in _pairs)
            if (_cd.TryGetValue(pair.Key, out _)) 
                count++;
        return count;
    }
}

public class MyObject { }

@stephentoub stephentoub added this to the 8.0.0 milestone Feb 2, 2023
@ghost ghost assigned stephentoub Feb 2, 2023
@ghost
Copy link

ghost commented Feb 2, 2023

Tagging subscribers to this area: @dotnet/area-system-collections
See info in area-owners.md if you want to be subscribed.

Issue Details
  • By default, string hash codes are randomized. This is an important defense-in-depth security measure, but it also adds some overhead when strings are used as keys in dictionaries. Dictionary<> addresses that overhead by starting out with using non-randomized comparers, and then upgrades to randomized comparers only once enough collisions have been detected. This PR updates ConcurrentDictionary<> with similar tricks. The comparer is moved from being stored on the dictionary itself to instead be stored on the Tables object that's atomically swapped when the table grows; that way, the comparer always remains in sync with the hashcodes stored in the nodes in that table. When we enumerate a bucket looking for an existing key as part of an add, we count how many items we traverse, and if that resulting number is larger than the allowed threshold and we're currently using a non-randomized comparer, we force a rehash; that rehash will replace the non-randomized comparer with the equivalent randomized one.
  • The ConcurrentDictionary<> ctor is improved to presize based on the size of a collection being passed in; otherwise, it might resize multiple times as it's populating the dictionary. The sizing logic is also changed to use the same prime bucket selection size as does Dictionary<>.
  • The method we were using to compute the bucket for a key wasn't being inlined for reference type keys due to the generic context; that method has been moved to the outer type as a static to avoid the non-inlined call and extra generic dictionary lookup.
  • For all key types, we were also paying for a non-inlined ldelema helper call when reading the head node of a bucket; that's been addressed via a struct wrapper with a volatile node field, rather than using Volatile.Read to access the array element.
  • We were inconsistent in whether checked math was used in computing the size of the table. In some situations it would be possible to overflow without it being detected, or for it to be detected and manifest in various ways. This simplifies to just always use checked for computing the counts.
  • Remove unnecessary try/finally blocks that are leftover from CERs and thread abort protection.
  • Deduped some code with calls to helper functions.
Type Method Toolchain CustomComparer IgnoreCase Mean Error StdDev Ratio
GuidKeys TryGetValue \main ? ? 778.9 ns 13.28 ns 12.42 ns 1.00
GuidKeys TryGetValue \pr ? ? 565.5 ns 8.36 ns 7.82 ns 0.73
ObjectKeys TryGetValue \main False ? 1,139.3 ns 19.87 ns 17.61 ns 1.00
ObjectKeys TryGetValue \pr False ? 1,103.1 ns 11.62 ns 10.87 ns 0.97
StringKeys TryGetValue \main ? False 2,616.4 ns 41.60 ns 36.88 ns 1.00
StringKeys TryGetValue \pr ? False 1,826.4 ns 18.59 ns 15.52 ns 0.70
ObjectKeys TryGetValue \main True ? 1,122.4 ns 9.74 ns 9.11 ns 1.00
ObjectKeys TryGetValue \pr True ? 1,050.3 ns 11.08 ns 10.36 ns 0.94
StringKeys TryGetValue \main ? True 2,331.2 ns 25.81 ns 22.88 ns 1.00
StringKeys TryGetValue \pr ? True 1,359.4 ns 24.38 ns 22.81 ns 0.58
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Linq;

[MemoryDiagnoser(false)]
public partial class Program
{
    static void Main(string[] args) => BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
}

[MemoryDiagnoser(false)]
public class StringKeys
{
    private KeyValuePair<string, string>[] _pairs;
    private ConcurrentDictionary<string, string> _cd;

    [Params(false, true)]
    public bool IgnoreCase { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _pairs =
            // from https://github.com/dotnet/runtime/blob/a30de6d40f69ef612b514344a5ec83fffd10b957/src/libraries/System.Formats.Asn1/src/System/Formats/Asn1/WellKnownOids.cs#L317-L419
            new[]
            {
                "1.2.840.10040.4.1", "1.2.840.10040.4.3", "1.2.840.10045.2.1", "1.2.840.10045.1.1", "1.2.840.10045.1.2", "1.2.840.10045.3.1.7", "1.2.840.10045.4.1", "1.2.840.10045.4.3.2", "1.2.840.10045.4.3.3", "1.2.840.10045.4.3.4",
                "1.2.840.113549.1.1.1", "1.2.840.113549.1.1.5", "1.2.840.113549.1.1.7", "1.2.840.113549.1.1.8", "1.2.840.113549.1.1.9", "1.2.840.113549.1.1.10", "1.2.840.113549.1.1.11", "1.2.840.113549.1.1.12", "1.2.840.113549.1.1.13",
                "1.2.840.113549.1.5.3", "1.2.840.113549.1.5.10", "1.2.840.113549.1.5.11", "1.2.840.113549.1.5.12", "1.2.840.113549.1.5.13", "1.2.840.113549.1.7.1", "1.2.840.113549.1.7.2", "1.2.840.113549.1.7.3", "1.2.840.113549.1.7.6",
                "1.2.840.113549.1.9.1", "1.2.840.113549.1.9.3", "1.2.840.113549.1.9.4", "1.2.840.113549.1.9.5", "1.2.840.113549.1.9.6", "1.2.840.113549.1.9.7", "1.2.840.113549.1.9.14", "1.2.840.113549.1.9.15", "1.2.840.113549.1.9.16.1.4",
                "1.2.840.113549.1.9.16.2.12", "1.2.840.113549.1.9.16.2.14", "1.2.840.113549.1.9.16.2.47", "1.2.840.113549.1.9.20", "1.2.840.113549.1.9.21", "1.2.840.113549.1.9.22.1", "1.2.840.113549.1.12.1.3", "1.2.840.113549.1.12.1.5",
                "1.2.840.113549.1.12.1.6", "1.2.840.113549.1.12.10.1.1", "1.2.840.113549.1.12.10.1.2", "1.2.840.113549.1.12.10.1.3", "1.2.840.113549.1.12.10.1.5", "1.2.840.113549.1.12.10.1.6", "1.2.840.113549.2.5", "1.2.840.113549.2.7",
                "1.2.840.113549.2.9", "1.2.840.113549.2.10", "1.2.840.113549.2.11", "1.2.840.113549.3.2", "1.2.840.113549.3.7", "1.3.6.1.4.1.311.17.1", "1.3.6.1.4.1.311.17.3.20", "1.3.6.1.4.1.311.20.2.3", "1.3.6.1.4.1.311.88.2.1",
                "1.3.6.1.4.1.311.88.2.2", "1.3.6.1.5.5.7.3.1", "1.3.6.1.5.5.7.3.2", "1.3.6.1.5.5.7.3.3", "1.3.6.1.5.5.7.3.4", "1.3.6.1.5.5.7.3.8", "1.3.6.1.5.5.7.3.9", "1.3.6.1.5.5.7.6.2", "1.3.6.1.5.5.7.48.1", "1.3.6.1.5.5.7.48.1.2",
                "1.3.6.1.5.5.7.48.2", "1.3.14.3.2.26", "1.3.14.3.2.7", "1.3.132.0.34", "1.3.132.0.35", "2.5.4.3", "2.5.4.5", "2.5.4.6", "2.5.4.7", "2.5.4.8", "2.5.4.10", "2.5.4.11", "2.5.4.97", "2.5.29.14", "2.5.29.15", "2.5.29.17", "2.5.29.19",
                "2.5.29.20", "2.5.29.35", "2.16.840.1.101.3.4.1.2", "2.16.840.1.101.3.4.1.22", "2.16.840.1.101.3.4.1.42", "2.16.840.1.101.3.4.2.1", "2.16.840.1.101.3.4.2.2", "2.16.840.1.101.3.4.2.3", "2.23.140.1.2.1", "2.23.140.1.2.2",
            }.Select(s => new KeyValuePair<string, string>(s, s)).ToArray();
        _cd = new ConcurrentDictionary<string, string>(_pairs, IgnoreCase ? StringComparer.Ordinal : StringComparer.OrdinalIgnoreCase);
    }

    [Benchmark]
    public int TryGetValue()
    {
        int count = 0;
        foreach (KeyValuePair<string, string> pair in _pairs)
            if (_cd.TryGetValue(pair.Key, out _)) 
                count++;
        return count;
    }
}

[MemoryDiagnoser(false)]
public class GuidKeys
{
    private KeyValuePair<Guid, int>[] _pairs;
    private ConcurrentDictionary<Guid, int> _cd;

    [GlobalSetup]
    public void Setup()
    {
        _pairs = Enumerable.Range(0, 99).Select(i => new KeyValuePair<Guid, int>(Guid.NewGuid(), i)).ToArray();
        _cd = new ConcurrentDictionary<Guid, int>(_pairs);
    }

    [Benchmark]
    public int TryGetValue()
    {
        int count = 0;
        foreach (KeyValuePair<Guid, int> pair in _pairs)
            if (_cd.TryGetValue(pair.Key, out _)) 
                count++;
        return count;
    }
}

[DisassemblyDiagnoser(maxDepth: 2)]
[MemoryDiagnoser(false)]
public class ObjectKeys
{
    private KeyValuePair<MyObject, int>[] _pairs;
    private ConcurrentDictionary<MyObject, int> _cd;

    [Params(false, true)]
    public bool CustomComparer { get; set; }

    [GlobalSetup]
    public void Setup()
    {
        _pairs = Enumerable.Range(0, 99).Select(i => new KeyValuePair<MyObject, int>(new MyObject(), i)).ToArray();
        _cd = new ConcurrentDictionary<MyObject, int>(_pairs, CustomComparer ? ReferenceEqualityComparer.Instance : EqualityComparer<MyObject>.Default);
    }

    [Benchmark]
    public int TryGetValue()
    {
        int count = 0;
        foreach (KeyValuePair<MyObject, int> pair in _pairs)
            if (_cd.TryGetValue(pair.Key, out _)) 
                count++;
        return count;
    }
}

public class MyObject { }
Author: stephentoub
Assignees: -
Labels:

area-System.Collections, tenet-performance

Milestone: 8.0.0

- By default, string hash codes are randomized.  This is an important defense-in-depth security measure, but it also adds some overhead when strings are used as keys in dictionaries.  `Dictionary<>` addresses that overhead by starting out with using non-randomized comparers, and then upgrades to randomized comparers only once enough collisions have been detected. This PR updates `ConcurrentDictionary<>` with similar tricks.  The comparer is moved from being stored on the dictionary itself to instead be stored on the Tables object that's atomically swapped when the table grows; that way, the comparer always remains in sync with the hashcodes stored in the nodes in that table.  When we enumerate a bucket looking for an existing key as part of an add, we count how many items we traverse, and if that resulting number is larger than the allowed threshold and we're currently using a non-randomized comparer, we force a rehash; that rehash will replace the non-randomized comparer with the equivalent randomized one.
- The `ConcurrentDictionary<>` ctor is improved to presize based on the size of a collection being passed in; otherwise, it might resize multiple times as it's populating the dictionary.  The sizing logic is also changed to use the same prime bucket selection size as does `Dictionary<>`.
- The method we were using to compute the bucket for a key wasn't being inlined for reference type keys due to the generic context; that method has been moved to the outer type as a static to avoid the non-inlined call and extra generic dictionary lookup.
- For all key types, we were also paying for a non-inlined ldelema helper call when reading the head node of a bucket; that's been addressed via a struct wrapper with a volatile node field, rather than using Volatile.Read to access the array element.
- We were inconsistent in whether checked math was used in computing the size of the table.  In some situations it would be possible to overflow without it being detected, or for it to be detected and manifest in various ways.  This simplifies to just always use checked for computing the counts.
- Remove unnecessary try/finally blocks that are leftover from CERs and thread abort protection.
- Deduped some code with calls to helper functions.
Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me, but I need a confirmation that we have good test coverage for forceRehashIfNonRandomized scenario before I approve.

I was expecting to see some nice perf gains on ARM so I've built your fork and run the TechEmpower caching benchmark:

git clone https://github.com/stephentoub/runtime.git --branch cdcomparer stoub_runtime
cd .\stoub_runtime\
.\build.cmd -c Release -subset clr+libs+libs.tests
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --scenario caching --profile aspnet-citrine-ampere --application.framework net8.0
crank --config https://raw.githubusercontent.com/aspnet/Benchmarks/main/scenarios/platform.benchmarks.yml --scenario caching --profile aspnet-citrine-ampere --application.framework net8.0 --application.options.outputFiles .\artifacts\bin\runtime\net8.0-windows-Release-x64\System.Collections.Concurrent.dll

Without your changes the RPS is quite stable: 859k, 856k, 861k, 864k, 852k, 863k
With your changes the differences are much bigger: 816k, 867k, 892k, 827k, 873k, 860k, 879k

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @stephentoub !

@AndyAyersMS
Copy link
Member

@EgorBo
Copy link
Member

EgorBo commented Feb 21, 2023

Improvements on x64:

@ghost ghost locked as resolved and limited conversation to collaborators Mar 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants