Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate Log Attribute Keys #4324

Open
c4brei opened this issue Mar 22, 2023 · 6 comments
Open

Duplicate Log Attribute Keys #4324

c4brei opened this issue Mar 22, 2023 · 6 comments
Labels
logs Logging signal related question Further information is requested

Comments

@c4brei
Copy link

c4brei commented Mar 22, 2023

Question

Use Github Discussions.

Currently when using Logging attributes in Scope and LogMessage I observed, that the dotnet exporter exports every attribute not just the latest occurrence.

using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "abc - scope0") }))    // scope 0
using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "def - scope1") }))    // scope 1
{
    logger.LogInformation("Hello Log Message: '{fooAttr}' '{barAttr}'", "ghi - log msg", "barAttrValue");
}

In this example fooAttr is added 3 times to the attribute list

Output of ConsoleLogExporter:

LogRecord.Timestamp:               2023-03-22T12:44:02.5096056Z
LogRecord.CategoryName:            Program
LogRecord.LogLevel:                Information
LogRecord.FormattedMessage:        Hello Log Message: 'ghi - log msg' 'barAttrValue'
LogRecord.StateValues (Key:Value):
    fooAttr: ghi - log msg
    barAttr: barAttrValue
    OriginalFormat (a.k.a Body): Hello Log Message: '{fooAttr}' '{barAttr}'
LogRecord.ScopeValues (Key:Value):
[Scope.0]:fooAttr: abc - scope0
[Scope.1]:fooAttr: def - scope1

OpenTelemetry Collector receives list with duplicate attribute keys:

Timestamp: 2023-03-22 12:44:02.5096056 +0000 UTC
SeverityText: Information
SeverityNumber: Info(9)
Body: Str(Hello Log Message: 'ghi - log msg' 'barAttrValue')
Attributes:
     -> dotnet.ilogger.category: Str(Program)
     -> fooAttr: Str(ghi - log msg)
     -> barAttr: Str(barAttrValue)
     -> {OriginalFormat}: Str(Hello Log Message: '{fooAttr}' '{barAttr}')
     -> fooAttr: Str(abc - scope0)
     -> fooAttr: Str(def - scope1)
Trace ID:
Span ID:
Flags: 0

This could be an unintentional side effect from the removal of the scope depth prefix (#3843).
It seems that during serialization, first all attributes from the log message then all attributes from the scopes are serialized (beginning with the oldest scope). But no deduplication is done during this process.

The specification describes Log-Attributes to be a 'map<string, any>' where attribute keys are unique (data-model.md). Currently the actual output seems to be only a serialized list.

Is this behavior intended and the specification incomplete or is this behavior a Bug?

Additional Context

Depending on a receiving backend tool the attribute list may be interpreted differently. I.e. in Loki only the last occurrence of an attribute is parsed -> fooAttr is def - scope1 instead of ghi - log msg
image

Reproduce

Using net6.0 and the following OpenTelemetry NuGet-Packages

    <PackageReference Include="Microsoft.Extensions.Logging" Version="7.0.0" />
    <PackageReference Include="OpenTelemetry.Exporter.Console" Version="1.4.0" />
    <PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol" Version="1.4.0" />
    <PackageReference Include="OpenTelemetry.Exporter.OpenTelemetryProtocol.Logs" Version="1.4.0-rc.4" />
// See https://aka.ms/new-console-template for more information

using Microsoft.Extensions.Logging;
using OpenTelemetry.Exporter;
using OpenTelemetry.Logs;

using var loggerFactory = LoggerFactory.Create(builder =>
{
    builder.AddOpenTelemetry(options =>
    {
        options.AddConsoleExporter();
        options.AddOtlpExporter(o =>
        {
            o.Protocol = OtlpExportProtocol.Grpc;
            o.HttpClientFactory = () =>
            {
                var client = new HttpClient();

                return client;
            };
            o.Endpoint = new Uri("http://localhost:4317/v1/logs");
        });
        options.IncludeScopes = true;
        options.ParseStateValues = true;
        options.IncludeFormattedMessage = true;
    });
});

var logger = loggerFactory.CreateLogger<Program>();

using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "abc - scope0") }))    // scope 0
using (logger.BeginScope(new[] { new KeyValuePair<string, object>("fooAttr", "def - scope1") }))    // scope 1
{
    logger.LogInformation("Hello Log Message: '{fooAttr}' '{barAttr}'", "ghi - log msg", "barAttrValue");
}
@c4brei c4brei added the question Further information is requested label Mar 22, 2023
@cijothomas cijothomas added the logs Logging signal related label Mar 22, 2023
@cijothomas
Copy link
Member

Deduplication is going to be expensive. The OTLP proto does not validate (its basically List and not a Map), so the onus is on the upstream to avoid duplicates.

  1. As a user prevent duplicates.
  2. OTel sdk OR OTLP Exporter can likely provide an opt-in mechanism to dedup for users who cannot ensure 1 and can accept the perf penality.

It is not an oversight, this was considered in the PR itself https://github.com/open-telemetry/opentelemetry-dotnet/pull/3843/files#diff-e28eaa16d5ac47130740a2783c474f316e9a1699aa3ada3a537951246e7cf517R172

Also, scopes is not the only way one can have duplicates. Logger.LogInfo("hello from {fruit} {fruit} {fruit}", "apple", "banana", "mango"); can also cause duplicates.

@cijothomas
Copy link
Member

@c4brei btw, excellent write up with clear description and repro steps!

@cijothomas
Copy link
Member

Tagged it for the OTLP Log Exporter stable release milestone. We could end up with not doing anything (and just ask users to not produce duplicate attributes), or provide opt-in mechanism to dedup.

@c4brei
Copy link
Author

c4brei commented Mar 23, 2023

Thank you for the quick answer :)

The mentioned option to deduplicate Attributes would be very helpful.
The resulting Mapping errors from duplicates in backend tools (i.e. the mentioned Loki case) can be hard to notice when the attribute value itself is difficult to read (e.g. guid).

In a larger project it will be difficult to enforce or remind everyone to use attributes with care.
Especially since attribute creation can be hidden and retroactively altered just by adding a scope, with the same attribute name, earlier in the call chain, thus it is easy to introduce a mistake.
You would have to go up/down the call chain to check for duplicate attribute names.

@c4brei
Copy link
Author

c4brei commented Mar 23, 2023

Some additional ideas that might be helpful when this topic comes around for the stable release

Adjust attribute serialization order

  • first serialize scopes (oldest to youngest)
  • then attributes from LogMessage
  • so the last attribute in the list should always be the most recent one that was defined

Option to disable export of Attributes defined in a LogMessage

  • so that logger.LogInformation("Some fruit '{fruit}'", "banana"); behaves like logger.LogInformation($"Some fruit '{fruitObj.name}'");
  • this could prevent accidental creation of duplicate attributes
  • and prevent some performance decrease of string interpolation

@cijothomas
Copy link
Member

so that logger.LogInformation("Some fruit '{fruit}'", "banana"); behaves like logger.LogInformation($"Some fruit '{fruitObj.name}'");

This defeats the purpose of structured logging by putting everything into a string, so this should be discouraged.!

@utpilla utpilla modified the milestones: 1.6.0, 1.7.0 Aug 8, 2023
@utpilla utpilla removed this from the 1.7.0 milestone Oct 17, 2023
tigrannajaryan pushed a commit to open-telemetry/opentelemetry-specification that referenced this issue Apr 26, 2024
…map as opt-in (#3987)

Fixes #3931

Per agreement: #3931 (comment)

> The SDKs should handle the key-value deduplication by default. It is acceptable to add an option to disable deduplication.

Previous PR: #3938


> I think it is fine to do the deduplication anywhere you want as long as externally observable data complies with this document.

The main purpose of this PR is to have an agreement for following questions (and update the specification to to make it more clear):
1. Is the deduplication required for all log exporters or only OTLP log exporters? Answer: It is required for all exporters.
2. Can the key-value deduplication for log records be opt-in? Answer: Yes, it is OK as long as it is documented that it can cause problems in case maps duplicated keys are exported.

Related to:
- open-telemetry/opentelemetry-go#5086
- open-telemetry/opentelemetry-dotnet#4324
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logs Logging signal related question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants