Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture FQDN in APM agents #805

Merged
merged 6 commits into from
Jul 5, 2023
Merged

Conversation

SylvainJuge
Copy link
Member

@SylvainJuge SylvainJuge commented Jun 14, 2023

Summary

Specification change for #793 .

APM agents should capture the FQDN instead of just the host name, APM Server will handle compatibility with current behavior through configuration (as it's a breaking change).

Fixes #794


Checklist

  • validate the FQDN heuristic on Windows
    • powershell heuristic tested on a Windows VM, custom DNS suffix set through System Properties > Computer Name > Change > More > Primary DNS suffix of this computer, then reboot. When set the System Properties > Computer Name should show the FQDN name in Full computer name field.
  • compatibility with older APM server versions: keep sending the current detected_hostname for APM server < 8.9 for the agents that did not already used the FQDN ?
    • NO: host.name is not used very much in APM, thus it's not considered to be a breaking change: comment
  • for Windows, should we include the same heuristic described in Update metadata spec with host domain name information #795 ?
    • NO: relying on env. variables or currently logged-in user is not reliable

  • May the instrumentation collect sensitive information, such as secrets or PII (ex. in headers)?
    • No:
      • Why? hostname and domain name are not sensitive information
  • Create PR as draft
  • Approval by at least one other agent
  • Mark as Ready for Review (automatically requests reviews from all agents and PM via CODEOWNERS)
    • Remove PM from reviewers if impact on product is negligible
    • Remove agents from reviewers if the change is not relevant for them
  • Approved by at least 2 agents + PM (if relevant)
  • Merge after 7 days passed without objections
    To auto-merge the PR, add /schedule YYYY-MM-DD to the PR description.
  • Create implementation issues through the meta issue template (this will automate issue creation for individual agents) already created in Report FQDN for detected_hostname #793
  • If this spec adds a new dynamic config option, add it to central config. n/a

Copy link
Member

@trentm trentm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly I think using hostname -f (for non-Windows) is fine, with a fallback
to hostname (or whatever technique that uses the gethostname syscall).

On Windows I don't have a sense of whether the envvars are sufficient replacements
for the win32 API call (see what libbeat does below), but I'm less concerned about Windows.

I have some questions and notes below.


Q1: Does the "avoid using DNS" warning/condition apply now that we prefer getting the FQDN?
Back when this "Hostname" spec section was added (#517) it was mentioned, part way through the discussion, that DNS should be avoided.

Many of the various techniques I've found for getting the FQDN will do a
DNS query. That includes what beats are doing:

IIUC, hostname -f will possibly do a DNS lookup. (ref: https://serverfault.com/a/367496)

A surprising result I found when playing with the libbeat implementation is that in a Docker container on my Mac, it (my small fqdn.go) will get the hostname of my mac, whereas hostname -f will not.

root@f224d07b5aee:/repo# hostname -f
f224d07b5aee
root@f224d07b5aee:/repo# ./fqdn
2023/06/16 00:04:26 XXX hostname: "f224d07b5aee"
2023/06/16 00:04:26 XXX cname: "pink.local."
2023/06/16 00:04:26 XXX fqdn: "pink.local"        <-- it gets "pink.local" from a DNS query of some sort

Q2: Should we also be specifying that the FQDN should be lowercased, as the ECS host.name docs recommend? The libbeat impl is lowercasing.

Q3: (Unrelated to FQDN) Do other implementations set both of metadata.hostname and metadata.detected_hostname or do you wait for APM Server version detection to know if the APM server is >=7.4?

Some links to various implementations I've seen or heard, in case it helps others:

  • Python has socket.getfqdn() which uses gethostname and gethostbyaddr. IIUC, when called with no argument to use the local host, that'll always just return whatever gethostname() returns.
  • opentelemetry-collector-config has a system metadata provider that uses Showmax/go-fqdn that: first tries using /etc/hosts, then falls back to a DNS query. The DNS query code looks the same as the libbeat info above.

specs/agents/metadata.md Outdated Show resolved Hide resolved
specs/agents/metadata.md Outdated Show resolved Hide resolved
specs/agents/metadata.md Outdated Show resolved Hide resolved
@trentm
Copy link
Member

trentm commented Jun 19, 2023

Q2: Should we also be specifying that the FQDN should be lowercased, as the ECS host.name docs recommend? The libbeat impl is lowercasing.

Another argument for lowercasing, at least for Windows is this from MS docs (https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/hostname):

Environment variable %COMPUTERNAME% usually will print the same string as hostname, but in uppercase.

Aside: Yeah Microsoft for (a) the uppercasing and (b) especially for the "usually".

specs/agents/metadata.md Outdated Show resolved Hide resolved
@SylvainJuge
Copy link
Member Author

Q1: Does the "avoid using DNS" warning/condition apply now that we prefer getting the FQDN?

A1: I think capturing the FQDN will very probably involve a DNS query, while that might add some latency and introduce some flakyness I don't think we have better options. In the cases where it's not reliable the fallback using environment variables (the "standard ones" or our own) could still be used.

Q2: Should we also be specifying that the FQDN should be lowercased, as the ECS host.name docs recommend? The libbeat impl is lowercasing.

A2: we should lower-case, spec has been updated to cover this.

Q3: (Unrelated to FQDN) Do other implementations set both of metadata.hostname and metadata.detected_hostname or do you wait for APM Server version detection to know if the APM server is >=7.4?

A3: for Java at least we do check the server version >= before sending metadata.hostname and metadata.detected_hostname.

@SylvainJuge SylvainJuge self-assigned this Jun 20, 2023
specs/agents/metadata.md Outdated Show resolved Hide resolved
Comment on lines 46 to 49
if (hostname == null || hostname.length == 0)
hostname = env.get("HOSTNAME")
if (hostname == null || hostname.length == 0)
hostname = env.get("HOST")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a final nit, I am not implementing this fallback of using the HOSTNAME and HOST envvars in the Node.js APM agent.

I couldn't quickly find any references for the HOST envvar being set by anything authoritative. The HOSTNAME var is set by Bash, which doesn't strike me as an authority we should use. FWIW, this SO post points out that HOSTNAME is not an envvar defined by POSIX (https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html).

This earlier spec comment was the only discussion around having the HOSTNAME and HOST fallbacks that I could see. I don't know if it originated from what Beats were doing. My understanding is that now beats are using an algorithm that does not use these vars (https://github.com/elastic/beats/blob/c7c18b37eb34f0665f6d9a3b2c7ea4543018ba6a/libbeat/cmd/instance/beat.go#L789-L803 https://github.com/elastic/go-sysinfo/blob/9ea2eba5301c77b511c0475d7137ea3437a55a9a/providers/shared/fqdn.go#L29-L43)

Regarding current usage, it looks to me like only the Java and .NET agents are currently looking at the HOSTNAME and HOST envvars. (FWIW. the .NET agent looks at all of HOSTNAME, HOST, and COMPUTERNAME on all platforms). None of the other agents that I saw are using those envvars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and feedback @trentm !

I agree with you that those variables might definitely not standard nor available to the runtime as they are usually set by shell and are unlikely to be set on all OSes and distributions. I am assuming (slight speculation on my end) the intent to have them was to provide a "fallback just in case", and that they might not be available most of the time.

However, I don't see any major downside to having them in the spec and using them as fallback, even if it's rarely used in practice. So for now I would suggest we keep them as-is and revisit this decision later if that proves to be an issue or a source of extra complexity.

@SylvainJuge SylvainJuge marked this pull request as ready for review June 22, 2023 07:58
@SylvainJuge SylvainJuge requested review from a team as code owners June 22, 2023 07:58
@SylvainJuge SylvainJuge requested a review from a team as a code owner June 22, 2023 07:58
@SylvainJuge SylvainJuge removed request for a team June 22, 2023 07:58
hostname = exec "uname -n" // or any equivalent *
if (hostname == null || hostname.length == 0)
hostname = exec "hostname" // or any equivalent *
hostname = exec "hostname -f" // or any equivalent *
Copy link
Contributor

@stevejgordon stevejgordon Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SylvainJuge In my testing while implementing for .NET, it seems that this command may return with a period suffix (e.g. DESKTOP-NCGCCT0.) or even the suffix .localdomain. Do we send as-is, or should we consider trimming in either case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevejgordon Where did you see an example like DESKTOP-NCGCCT0.? Is that an OS other than Windows?

or even the suffix .localdomain

I'm not sure if it is equivalent, but on macOS the domain is commonly ".local", e.g. on my laptop:

% hostname -f
pink.local

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm testing using WSL2 on Windows with Ubuntu. On my Windows laptop, where no domain is set, it reports the period suffix when I run hostname -f from Ubuntu (using WSL2). I also set up a Windows 11 VM to test adding the DNS suffix and again added WSL Ubuntu to test the code and picked up the FQDN, which it did. Once I removed the DNS suffix, .localdomain started appearing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For .NET, as the primary mechanism, I'm planning to use the Dns.GetHostEntry(string.Empty).HostName API, which is cross-platform aware and seems to potentially include the suffix, at least when testing on Ubuntu via WSL2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here that's clearly a case where using the API seems to be more consistent than the equivalent with hostname.
Does it also returns a consistent value when invoked outside of WSL ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@stevejgordon stevejgordon Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trentm The .NET API eventually seems to call into the native Windows Sockets 2, Winsock API gethostname function and on Linux the gethostname system call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I understand this correctly, you are suggesting to just remove the extra . that is appended when calling this API through WSL ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SylvainJuge Originally, yes, I thought perhaps we should. However, as it's consistent with hostname -f and therefore, presumably what other agents on the same host will send, perhaps best to leave it unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevejgordon do you think we need to add a clarification about this in the spec or could we leave it as-is ?

@SylvainJuge SylvainJuge merged commit 4fe90c7 into elastic:main Jul 5, 2023
1 check passed
@SylvainJuge SylvainJuge deleted the hostname-fqdn branch July 5, 2023 09:33
@trentm trentm mentioned this pull request Nov 21, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[META 793] Spec: Report FQDN for detected_hostname
6 participants