Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQ] Very difficult to know what went wrong when posts to Azure Monitor fail #34483

Closed
samsp-msft opened this issue Feb 22, 2023 · 12 comments
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@samsp-msft
Copy link

samsp-msft commented Feb 22, 2023

Library name

Azure.Monitor.OpenTelemetry.Exporter

Please describe the feature.

Diagnosing why the OpenTelemetry SDK is not working is too hard and un-obvious to non-team members.

Currently errors in reporting are logged as EventSource events, and unbeknownst to me, in the Visual Studio output window. This needs to be made much more obvious to developers. I was running dotnet-monitor and having to do a post to configure the event collection as it's not a standard source, and having to break in the debugger to enable the collection before some of the events are fired.

curl -X 'POST' \
  'https://localhost:52323/trace?name=loadtest&durationSeconds=30' \
  -H 'accept: application/octet-stream' \
  -H 'Content-Type: application/json' \
  -d '{
  "providers": [
    {"name": "OpenTelemetry-Sdk","eventLevel": "5"},
    {"name": "OpenTelemetry-AzureMonitor-Exporter","eventLevel": "5"},
    {"name": "Azure-Core","5": "Debug"}
  ],
  "requestRundown": true,
  "bufferSizeInMB": 1024
}'

When running ASP.NET Core Apps, typically I look to the console output to tell me what is going wrong. If Open Telemetry is failing, it should probably be using ILogger to write output to the console, and use appropriate levels so only error information is included by default (that can be changed through config).

Similarly, when I have problems with the deployed app, the first place to look is the console. Azure App-Service, Functions and Container Apps all have support for streaming the console output. So reporting to the console is the first place to look for explanations of issues.

I tried using OTEL_DIAGNOSTICS.json - and the file just contained Successfully opened file.

@ghost ghost added needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. customer-reported Issues that are reported by GitHub users external to the Azure organization. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Feb 22, 2023
@samsp-msft
Copy link
Author

@reyang @bradygaster

@samsp-msft
Copy link
Author

@cijothomas

@cijothomas
Copy link
Contributor

https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/README.md#self-diagnostics

If you enabled "Error" level, and nothing of Error severity has occurred then that file will be basically empty (apart from the opened file message..TODO: I think this message should be improved to call out that...)

Could you make it Verbose?

Anyway, if nothing of Error severity has occurred, then its a good indication everything is working well. What was the issue you were having that you attempted to troubleshoot?

@samsp-msft
Copy link
Author

I put a bad connection string, so I am getting a 400 when logs are posted.

@jsquire jsquire added Service Attention Workflow: This issue is responsible by Azure service team. Client This issue points to a problem in the data-plane of the library. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team Monitor - Exporter Monitor OpenTelemetry Exporter and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Feb 23, 2023
@ghost
Copy link

ghost commented Feb 23, 2023

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @cijothomas, @reyang, @rajkumar-rangaraj, @TimothyMothra, @vishweshbankwar.

Issue Details

Library name

Azure.Monitor.OpenTelemetry.Exporter

Please describe the feature.

Diagnosing why the OpenTelemetry SDK is not working is too hard and un-obvious to non-team members.

Currently errors in reporting are logged as EventSource events, and unbeknownst to me, in the Visual Studio output window. This needs to be made much more obvious to developers. I was running dotnet-monitor and having to do a post to configure the event collection as it's not a standard source, and having to break in the debugger to enable the collection before some of the events are fired.

curl -X 'POST' \
  'https://localhost:52323/trace?name=loadtest&durationSeconds=30' \
  -H 'accept: application/octet-stream' \
  -H 'Content-Type: application/json' \
  -d '{
  "providers": [
    {"name": "OpenTelemetry-Sdk","eventLevel": "5"},
    {"name": "OpenTelemetry-AzureMonitor-Exporter","eventLevel": "5"},
    {"name": "Azure-Core","5": "Debug"}
  ],
  "requestRundown": true,
  "bufferSizeInMB": 1024
}'

When running ASP.NET Core Apps, typically I look to the console output to tell me what is going wrong. If Open Telemetry is failing, it should probably be using ILogger to write output to the console, and use appropriate levels so only error information is included by default (that can be changed through config).

Similarly, when I have problems with the deployed app, the first place to look is the console. Azure App-Service, Functions and Container Apps all have support for streaming the console output. So reporting to the console is the first place to look for explanations of issues.

I tried using OTEL_DIAGNOSTICS.json - and the file just contained Successfully opened file.

Author: samsp-msft
Assignees: -
Labels:

Service Attention, Client, customer-reported, question, needs-team-attention, Monitor - Exporter

Milestone: -

@cijothomas
Copy link
Contributor

I put a bad connection string, so I am getting a 400 when logs are posted.

If you mean the connectionstring was malformatted/unparseable, I'd expect AzMon exporter to throw and let app crash on startup.
@vishweshbankwar - Please confirm if the above is correct.

If you mean your connectionstring was correct, but it had wrong ikey or wrong endpoint, then I'd expect AzMon Exporter to log the fact that it received failure (400) and optionally return "ExportFailed" back to the OTel SDK (which would log it).

@vishweshbankwar - Please confirm if Exporter is doing this logging or not.

@TimothyMothra
Copy link
Contributor

TimothyMothra commented Feb 23, 2023

Multiple points to respond to :)

  1. If you mean the connectionstring was malformatted/unparseable, I'd expect AzMon exporter to throw and let app crash on startup.

    This is correct. If the connectionstring is malformed or if the InstrumentationKey is not present, the exporter will throw/crash with a detailed message.

  2. If you mean your connectionstring was correct, but it had wrong ikey or wrong endpoint, then I'd expect AzMon Exporter to log the fact that it received failure (400)

    This is correct, we log these to Event Source which is also visible in Visual Studio's Debug Output:
    image

    The ingestion sometimes responds with a detailed error message. We should double check that we're logging those messages!

  3. and optionally return "ExportFailed" back to the OTel SDK (which would log it).

    This is correct. Our Transmitter will return ExportResult.Failed.
    Regarding OTel's Self Diagnostic behavior, I confirmed this is capturing logs from our EventSource.
    image
    image

  4. When running ASP.NET Core Apps, typically I look to the console output to tell me what is going wrong. If Open Telemetry is failing, it should probably be using ILogger to write output to the console, and use appropriate levels so only error information is included by default (that can be changed through config).

    This is a good idea but is a major change from what we and the community are doing today.
    I've seen similar asks in the https://github.com/open-telemetry/opentelemetry-dotnet repo but I haven't seen any consensus to move away from EventSource.
    Perhaps we can pilot this as a feature in the AzureMonitorExporter and share learnings with the community.

@reyang
Copy link
Member

reyang commented Feb 23, 2023

When running ASP.NET Core Apps, typically I look to the console output to tell me what is going wrong. If Open Telemetry is failing, it should probably be using ILogger to write output to the console, and use appropriate levels so only error information is included by default (that can be changed through config).

OpenTelemetry .NET as a general purpose library, it cannot write to console by default since this would break the user's expectation (especially if the user is expecting certain console output format). It is possible to allow the internal error logs to be configured to output to the console (e.g. it'll be even better if the internal logs are exposed via ILogger so we have the flexibility to write to file/console/etc.).

@samsp-msft Is there a contract between hosting environments (e.g. Functions) and libraries (e.g. OTel.NET) so libraries can expose the logs that hosting environments could subscribe to?

@samsp-msft
Copy link
Author

AFAIK the only service that has a specific contract around logging is Functions as they pass you an ILogger instance as part of the entrypoint into your function.

The various flavors of app-service don't have a specific contract with .NET around logging. They primarily rely on providing environment variables for app-insights. They have the ability to view the console output from the process that they are hosting via a Log Stream blade.

Container Apps relies on console output as its built-in logging mechanism, and provides a log stream, and collection of those entries into an azure monitor table configured as part of creation of the app.

There seems to be some precedent for directly outputting to the console from OTEL - that seems to be what using .AddConsoleExporter() is doing.

@cijothomas
Copy link
Contributor

cijothomas commented Feb 23, 2023

There seems to be some precedent for directly outputting to the console from OTEL - that seems to be what using .AddConsoleExporter() is doing.

^ Is only for learning purposes. It emits the actual telemetry itself, not the SDK internal logs. Not for production usage.

@cijothomas
Copy link
Contributor

e.g. it'll be even better if the internal logs are exposed via ILogger so we have the flexibility to write to file/console/etc.

Yes, this is the tracking issue for that : open-telemetry/opentelemetry-dotnet#3881
(surprisingly not many upvotes/comments there!)

@TimothyMothra
Copy link
Contributor

I'm going to close this issue.
We rewrote all diagnostic logging messages to make them more informative or actionable.

@github-actions github-actions bot locked and limited conversation to collaborators Dec 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Exporter Monitor OpenTelemetry Exporter needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

5 participants