Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling console logger queue length and mode to be configurable #70186

Merged
merged 21 commits into from
Jun 21, 2022

Conversation

maryamariyan
Copy link
Member

@maryamariyan maryamariyan commented Jun 2, 2022

Description

Adds configuration for console logger's queue length and mode

  • Adds new APIs for the logging console queue processor
  • Replaces BlockingCollection with Queue implementation
  • Adds code coverage for ConsoleLoggerProcessor (except for for ThreadStateException case in Dispose and GetQueueSize metrics logic)
  • Adds queue name and metrics

Performance testing

Comparing original BlockingCollection with new Queue implementation

Using a custom log formatter which only writes the string in message template, with no padding or prefix, using the snippet below:

        var howManyBytes = _1000Chars.Length * sizeof(char);
        var logger = loggerFactory.CreateLogger<Program>();
        var sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < 100000; i++)
        {
            logger.LogInformation(_1000Chars);
        }
        sw.Stop();
        var elapsedForLoggingConsole = sw.ElapsedMilliseconds;

        Console.Error.WriteLine($"Bytes written {howManyBytes * 100000 / (1024 * 1024)} MB");
        Console.Error.WriteLine($"elapsedForLoggingConsole: {elapsedForLoggingConsole} ms");
        Console.Error.WriteLine($"Speed: {(howManyBytes * 100000 / (1024 * 1024)) / (elapsedForLoggingConsole / 1000.0f)} MB/sec");

Table below shows the perf numbers comparing BlockingCollection vs. new Queue implementation given different max length sizes, running:

> dotnet run -c Release >> output-to-file.txt
|           Queue type | Max length |  Time elapsed |            Speed |   Bytes written |      Notes        |
|----------------------|------------|---------------|------------------|-----------------|-------------------|
|   BlockingCollection |         1  |      4319 ms  |  43.99167 MB/sec |          190 MB | Fastest (10 runs) |
|   BlockingCollection |         1  |      4546 ms  | 41.794983 MB/sec |          190 MB | Slowest (10 runs) |
|                Queue |         1  |      4309 ms  | 44.093758 MB/sec |          190 MB | Fastest (10 runs) |
|                Queue |         1  |      5078 ms  | 37.416306 MB/sec |          190 MB | Slowest (10 runs) |
|   BlockingCollection |        10  |      4269 ms  |  44.50691 MB/sec |          190 MB | Fastest (10 runs) |
|   BlockingCollection |        10  |      4482 ms  | 42.391792 MB/sec |          190 MB | Slowest (10 runs) |
|                Queue |        10  |      4374 ms  |   43.4385 MB/sec |          190 MB | Fastest (10 runs) |
|                Queue |        10  |      4496 ms  |  42.25979 MB/sec |          190 MB | Slowest (10 runs) |
|   BlockingCollection |      1024  |      4295 ms  |  44.50691 MB/sec |          190 MB | Fastest (10 runs) |
|   BlockingCollection |      1024  |      4574 ms  | 41.539135 MB/sec |          190 MB | Slowest (10 runs) |
|                Queue |      1024  |      4223 ms  |  44.99171 MB/sec |          190 MB | Fastest (10 runs) |
|                Queue |      1024  |      4675 ms  |  40.64171 MB/sec |          190 MB | Slowest (10 runs) |

This shows there's no time regression caused by using the queue approach. All of the cases above would hit the max queue length limit and they all have a perf hit because of hitting that limit.

The other test involves setting max queue length to a value that the logging load would not hit.
At this higher max value (making sure it wont hit max for looping 100_000 logs, total of 190 MB), both BlockingCollection and Queue approaches get a big performance boost when logging the same amount of logs:

Bytes written: 190 MB

Time elapsed using BlockingCollection (fastest after 10 reruns): 170 ms
Speed (fastest after 10 reruns): 1117.6471 MB/sec

Time elapsed using BlockingCollection (slowest after 10 reruns): 203 ms
Speed (slowest after 10 reruns): 935.96063 MB/sec

vs.

Time elapsed using Queue (fastest after 10 reruns): 169 ms
Speed (fastest after 10 reruns): 1124.2604 MB/sec

Time elapsed using Queue (slowest after 10 reruns): 201 ms
Speed (slowest after 10 reruns): 945.2736 MB/sec

vs.

System.Console
Time elapsed using System.Console: 4741 ms
Speed: 40.07593 MB/sec

Note: Given that these numbers are from manually using stopwatch, it doesn't cover the standard deviation error between using BlockingCollection and Queue. Overall both approaches remain in the same range in performance.

Fixes #65264

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost assigned maryamariyan Jun 2, 2022
@ghost
Copy link

ghost commented Jun 2, 2022

Tagging subscribers to this area: @dotnet/area-extensions-logging
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Adds configuration for console logger's queue length and mode

  • Adds new APIs for the logging console queue processor
  • Replaces BlockingCollection with Queue implementation
  • Adds code coverage for ConsoleLoggerProcessor (except for for ThreadStateException case in Dispose and GetQueueSize metrics logic)
  • Adds queue name and metrics

Performance testing

Comparing original BlockingCollection with new Queue implementation

Using a custom log formatter which only writes the string in message template, with no padding or prefix, using the snippet below:

        var howManyBytes = _1000Chars.Length * sizeof(char);
        var logger = loggerFactory.CreateLogger<Program>();
        var sw = new Stopwatch();
        sw.Start();
        for (int i = 0; i < 100000; i++)
        {
            logger.LogInformation(_1000Chars);
        }
        sw.Stop();
        var elapsedForLoggingConsole = sw.ElapsedMilliseconds;

        Console.Error.WriteLine($"Bytes written {howManyBytes * 100000 / (1024 * 1024)} MB");
        Console.Error.WriteLine($"elapsedForLoggingConsole: {elapsedForLoggingConsole} ms");
        Console.Error.WriteLine($"Speed: {(howManyBytes * 100000 / (1024 * 1024)) / (elapsedForLoggingConsole / 1000.0f)} MB/sec");

Table below shows the perf numbers comparing BlockingCollection vs. new Queue implementation given different max length sizes, running:

> dotnet run -c Release >> output-to-file.txt
|           Queue type | Max length |  Time elapsed |            Speed |   Bytes written |
|----------------------|------------|---------------|------------------|-----------------|
|   BlockingCollection |         1  |      4546 ms  | 41.794983 MB/sec |          190 MB |
|                Queue |         1  |      4481 ms  | 42.40125 MB/sec |          190 MB |
|   BlockingCollection |        10  |      4482 ms  | 42.391792 MB/sec |          190 MB |
|                Queue |        10  |      4486 ms  |  42.35399 MB/sec |          190 MB |
|   BlockingCollection |      1024  |      4574 ms  | 41.539135 MB/sec |          190 MB |
|                Queue |      1024  |      4303 ms  |  44.15524 MB/sec |          190 MB |

This shows there's no time regression caused by using the queue approach. All of the cases above would hit the max queue length limit and they all have a perf hit because of hitting that limit.

The other test involves setting max queue length to a value that the logging load would not hit.
At this higher max value (making sure it wont hit max for looping 100_000 logs, total of 190 MB), both BlockingCollection and Queue approaches get a big performance boost when logging the same amount of logs:

Bytes written: 190 MB

Time elapsed using BlockingCollection: 234ms
Speed: 811.9658 MB/sec

vs.

Time elapsed using Queue: 206ms
Speed: 922.3301 MB/sec

vs.

System.Console
Time elapsed using System.Console: 4741 ms
Speed: 40.07593 MB/sec
Author: maryamariyan
Assignees: maryamariyan
Labels:

new-api-needs-documentation, area-Extensions-Logging

Milestone: -

@maryamariyan
Copy link
Member Author

maryamariyan commented Jun 2, 2022

@maryamariyan
Copy link
Member Author

Still a few more feedback left to apply. Marked resolved the ones that are done with the new commit.

}
}

internal const int DefaultMaxQueueLengthValue = 1024 * 1024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is way too big. Previously we were discussing a 1MB limit on the buffer with the assumption that an average message might be 400 bytes and the goal was only to do a modest increase in buffer size. I proposed 400KB (~1000 messages) and Fowler counter-proposed 1MB (~2500 messages). I'm OK if we want to bump from 1000 messages to 2500 as a default, but I don't think we should jump to 1 million. As-is I expect this will create OOM or huge memory increases for apps that log aggressively.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2500 is not enough for a default value.

e.g. with

for (int i = 0; i < 10000; i++)
{
    logger.LogInformation(_1000Chars);
}

When testing to set the max size for the above for loop, the max size increase to 2500 won't be enough but for the same use case setting to 10,000 arbitrarily, makes sure we don't hit the lock contention and the speed jump that we want happens too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow your rationale? If that 10,000 comes from some of our perf tests it was picked arbitrarily as a 'big enough' number to amortize the influence of outliers and the start/finish code in the benchmark. It was never intended as an estimate of how many log calls are likely to be made in a bursty logging scenario so I wouldn't use it to tune default buffer sizes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion of value default. Just sharing my observation here in case it helps decide on the best number to choose. Changed default from 1024*1024 to 2500 in latest commit.

cc @davidfowl

@maryamariyan
Copy link
Member Author

Thanks for reviews. The PR is ready for another look.

// Assert
if (okToDrop)
{
Assert.True(errorConsole.TimesWriteCalled > 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the thread dequeing messages runs fast enough then nothing will be dropped and the test will fail. To make the test deterministic regardless of dequeue thread speed you would need to block at some point in the test console or console sink.

@noahfalk
Copy link
Member

Looking pretty good, just a few more things I spotted above.

Copy link
Member Author

@maryamariyan maryamariyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noahfalk @davidfowl I'll switch the default to some number in the middle. I'd like this to be able to merge for the upcoming preview to give contributors a chance to try out and let us know how it fits in their solutions. that might influence a better default number prior to our final previews.

@maryamariyan
Copy link
Member Author

PR is ready for review again

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM to me in terms of diagnostic functionality : )

{
// _fullMode is used inside the lock and is safer to guard setter with lock as well
// this set is not expected to happen frequently
_fullMode = value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we need a Monitor.PulseAll(_messageQueue); here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no thread would be waiting on full mode to change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about my comment here #70186 (comment)?

Copy link
Member Author

@maryamariyan maryamariyan Jun 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh seems like this comment is related, #70186 (comment)
updated.

Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the great work here, @maryamariyan!

@maryamariyan maryamariyan merged commit bb17fb5 into dotnet:main Jun 21, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jul 21, 2022
Monitor.Wait(_messageQueue);
}

if (_messageQueue.Count > 0 && !_isAddingCompleted)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after the Dispose is called - we won't log any messages that are still in buffer?
@vlada-shubina as FYI

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate more

  • Dispose calls CompleteAdding which tries to lock _messageQueue and then marks _isAddingCompleted with true
  • ProcessLogQueue calls TryDequeue in a loop, per single item. For each call it needs to lock _messageQueue to be able to return item to be logged by ProcessLogQueue
  • If _isAddingCompleted is true the TryDequeue will return false - even if _messageQueue.Count is nonempty

Those facts bring the following concerns:

  • The CompleteAdding and TryDequeue races for same lock. Once CompleteAdding wins, nothing more is logged, even if items are still queued - this seems to be a bug.
  • TryDequeue tries to acquire lock for each single item being logged - I'm surprised this ends up being faster then using low-lock BlockingCollection. If it really is - should the BlockingCollection be actualy revised?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JanKrivanek if you can provide a sample code to demonstrate the concerns, you may open a new issue and we can track looking at that there. Thanks for your feedback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am seeing you already logged #79812. We'll try to look at that at some point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sample repro: https://github.com/JanKrivanek/ConsoleLogging

Showing that messages are being dropped in 7.0 version of logging.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.NET console logging uses a blocking collection which is not optimal for cloud-native applications
9 participants