Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows script to install Splunk distro for .NET instead of SFx.NET #4343

Merged
merged 5 commits into from
Feb 28, 2024

Conversation

pjanotti
Copy link
Contributor

Description:
Change the Windows install script to install the Splunk Distribution of OpenTelemetry .NET
instead of the SignalFx Instrumentation for .NET
when the parameter -with_dotnet_instrumentation is set to $true

Link to Splunk idea:
N/A

Testing:
Local and CI validation on fork.

Documentation:
Pending: update install script documentation.

@pjanotti pjanotti requested review from a team as code owners February 26, 2024 20:27
} else {
echo "SIGNALFX_ENV environment variable not set. Unless otherwise defined, will appear as 'unknown' in the UI."
Update-OpenTelemetryCore
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a risk of doing unwanted silent update?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Most of the time the instrumented applications are running in IIS and the update will stop and recycle them which seems what the typical user wants. However, if there are instrumented .NET applications outside IIS then the update will fail when copying files which probably is not very helpful. Let me think a bit more about this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking maybe to fail with a specific message "... dotnet is already installed, run the install script again with -UpdateDotnet to update the agent'" 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning message is Splunk Distribution of OpenTelemetry .NET is already installed. Updating to the latest version... and will be displayed before any error so I think it gives enough context in case of an error updating.

try {
$dotnet_version = (Get-ItemProperty HKLM:\Software\Microsoft\Windows\CurrentVersion\Uninstall\* | Where { $_.DisplayName -eq "SignalFx .NET Tracing 64-bit" }).DisplayVersion
update_registry -path "$regkey" -name "SIGNALFX_GLOBAL_TAGS" "splunk.zc.method:signalfx-dotnet-tracing-${dotnet_version}"
update_registry -path "$regkey" -name "OTEL_RESOURCE_ATTRIBUTES" -value "$otel_resource_attributes"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to do here, but OTEL_RESOURCE_ATTRIBUTES seems to be conflicting with instrumenting agent and could leak to an instrumented process when in a global scope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to change this on this PR but the main thing here to me is that the splunk.zc.method seems to need a different value since we could have multiple instrumentations on the same box. However, deployment.environment is something that we want for the whole machine. I will have a conversation with Rotel about it before changing it, the current code is in line with what mass deployments are doing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the conflict is getting even bigger then? ... since envs don't stack by default and OTEL_RESOURCE_ATTRIBUTES can be easily overwritten (due being a generic variable) by a client.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will have to investigate this separately, likely also changing the value that we use for OTel resource attributes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After writing the answer to Aunsh I think it will make more sense to just bail out here: it will be consistent with what is done for the collector MSI: the script just bails out if there is already one version installed. We are also bailing out if SFx.NET is already installed.

@theletterf
Copy link
Contributor

+CC @mbechtold-splunk

@aunshc
Copy link
Contributor

aunshc commented Feb 27, 2024

@pjanotti to better understand our recommendation for existing users moving forward, today we ask them to upgrade the Collector by manually pulling the new Windows MSI correct (https://docs.splunk.com/observability/en/gdi/opentelemetry/collector-windows/windows-upgrade.html#id1)?

@pjanotti
Copy link
Contributor Author

pjanotti commented Feb 28, 2024

to better understand our recommendation for existing users moving forward, today we ask them to upgrade the Collector by manually pulling the new Windows MSI correct (https://docs.splunk.com/observability/en/gdi/opentelemetry/collector-windows/windows-upgrade.html#id1)?

Yes, the current recommendation for update is to use the MSI directly, not, the script. That said the script still needs to handle the case in which someone attempts to run it with the current version installed. This works for the collector because it is a single process that we need to stop. For the instrumentations the problem is a bit different because the user needs to stop the instrumented applications or alternatively upgrade and then reboot the machine to switch to the new version - the reboot is needed since files will be locked by the running process. Unfortunately this via reboot requires an MSI or code that we don't have in the current .NET Powershell script. Anyway, the recommendation should still be the same:

  1. To upgrade the collector, download and run the MSI of the new version - caveat: I need to document when upgrading to the version that uses the service scope environment, since this is a special case. I will create a KTLO ticket for it.
  2. To upgrade the .NET instrumentation OTel.NET to OTel.NET: stop all instrumented applications, download and run the upgrade function from powershell script for OTel .NET. Finally, restart all the applications.
  3. To upgrade the .NET instrumentation SFx.NET to OTel.NET: stop all instrumented applications, uninstall SFx.NET, reboot if needed (only if failed to stop all instrumented applications), install OTel.NET either via the powershell script (either the one from OTel.NET or the collector one).

cc @theletterf @aunshc

@pjanotti pjanotti merged commit c1eaf35 into signalfx:main Feb 28, 2024
85 checks passed
@pjanotti pjanotti deleted the migrate-install-ps-to-otel-dotnet branch February 28, 2024 20:36
@github-actions github-actions bot locked and limited conversation to collaborators Feb 28, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants