Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otelcol: fix windows event log core placement #11051

Merged
merged 7 commits into from
Sep 16, 2024

Conversation

braydonk
Copy link
Contributor

@braydonk braydonk commented Sep 4, 2024

Description

In the CollectorSettings LoggingOptions, when the options are zap.WrapCore, the final core wrapped will end up running first. For this reason, the windowsEventLogCore must be the first option in the list of user supplied options, to ensure user supplied options are run before the core that writes to the Windows Event Log.

Prior art: I fixed this before in #5298.

Link to tracking issue

#5297

Testing

Testing was manual by running the Collector as a service after this change. I tried to add a unit test but could not find an effective way to do that due to all the type indirection done by the zap package. I was hoping to find an easy way to tell that the windowsEventLogCore is the first core wrapped in the options but I could not find a way to pull that off. Open to ideas if anyone can find a way.

In the CollectorSettings LoggingOptions, when the options are
`zap.WrapCore`, the final core wrapped will end up running first. For
this reason, the windowsEventLogCore must be the first option in the
list of user supplied options, to ensure user supplied options are run
before the core that writes to the Windows Event Log.
@braydonk braydonk requested review from a team and bogdandrutu September 4, 2024 18:17
@@ -141,6 +138,19 @@ func openEventLog(serviceName string) (*eventlog.Log, error) {
return elog, nil
}

func loggingOptionsWithEventLogCore(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extracted this to a function for 2 reasons:

  1. I was trying to write a unit test for it (failed to find a way).
  2. I wanted the comment explaining why the prepend is necessary to really stand out for any future contributors looking at this.

Copy link

codecov bot commented Sep 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.23%. Comparing base (65bebec) to head (e60fb92).
Report is 25 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main   #11051   +/-   ##
=======================================
  Coverage   92.23%   92.23%           
=======================================
  Files         414      414           
  Lines       19810    19810           
=======================================
  Hits        18271    18271           
  Misses       1167     1167           
  Partials      372      372           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@braydonk
Copy link
Contributor Author

braydonk commented Sep 4, 2024

I don't yet understand why this breaks the LogToFile test. It seems like this test is painful to run locally so might take a bit for me to get it set up and figure out what's going on.

@braydonk
Copy link
Contributor Author

braydonk commented Sep 10, 2024

This test is broken for me on main, doesn't seem related to my changes. I think I'll need help to figure out why, I am super lost at this point. @pjanotti any chance you can help me figure out why this test appears to be busted?

EDIT: The test is not broken on main, I just hadn't rebuilt the binary. However I could still use assistance understanding how this change affects the logger in such a way to break file logging. At the spot I changed, there are no other logging options provided in the test anyway, so I don't understand at all what is going on.

EDIT 2: Wow, so it was the double pointer argument all along. Insane that I didn't see that. 😡

braydonk and others added 2 commits September 10, 2024 17:48
I should have passed the service config as a double pointer, not as a
pointer that I further reference within the function.
@pjanotti
Copy link
Contributor

pjanotti commented Sep 10, 2024

Thanks for spotting this @braydonk

the windowsEventLogCore must be the first option in the list of user supplied options, to ensure user supplied options are run before the core that writes to the Windows Event Log.

So there were some options in telemetry, e.g.: log level, being ignored when logging to the event log?

@@ -564,6 +564,8 @@ func newEnvProvider() confmap.ProviderFactory {
return confmap.NewRetrieved(float64(6.4))
case "env:BOOL":
return confmap.NewRetrieved(true)
case "env:ProgramData":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? Was it added for the manual tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left in by accident from manual tests. Removed in d8e89c2

userOptions []zap.Option,
) []zap.Option {
return append(
// The event log core must run last, so it needs to be the first option
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that I already changed that order in the past (and can't recall if by accident or intentionally) explicit wording seems advisable:

Suggested change
// The event log core must run last, so it needs to be the first option
// The order below must be preserved - see PR #11051
// The event log core must run last, so it needs to be the first option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the comment more explicit in cf2b4c5

# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the result of the change visible to user?

@atoulme atoulme added the os:windows Windows specific issues label Sep 10, 2024
@braydonk
Copy link
Contributor Author

braydonk commented Sep 11, 2024

So there were some options in telemetry, e.g.: log level, being ignored when logging to the event log?

This specifically affects users who actually use the otelcol package to create their own Collector service. In my use case, I added LoggingOptions to change certain Collector self-logs to DEBUG level (effectively muting them in our case). When the order of the event log core was changed in the LoggingOptions, it meant that it always ran before these options we provided.

Isn't the result of the change visible to user?

Not for users of an OpenTelemetry-project-provided Collector as far as I know. This change is only noticeable for people in my very specific case who provide LoggingOptions in their custom-built collector that runs on Windows as a service. 😄

Copy link
Contributor

@pjanotti pjanotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mx-psi mx-psi merged commit 6029f31 into open-telemetry:main Sep 16, 2024
49 checks passed
@github-actions github-actions bot added this to the next release milestone Sep 16, 2024
@braydonk braydonk deleted the event_log_fix branch September 16, 2024 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
os:windows Windows specific issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants