Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture FQDN in APM agents #805

Merged
merged 6 commits into from
Jul 5, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 18 additions & 22 deletions specs/agents/metadata.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
## Metadata

As mentioned above, the first "event" in each ND-JSON stream contains metadata to fold into subsequent events. The metadata that agents should collect includes are described in the following sub-sections.
As mentioned above, the first "event" in each ND-JSON stream contains metadata to fold into subsequent events. The metadata that agents should collect includes is are described in the following sub-sections.

- service metadata
- global labels (requires APM Server 7.2 or greater)
Expand All @@ -24,42 +24,38 @@ System metadata relates to the host/container in which the service being monitor

#### Hostname

This hostname reported by the agent is mapped by the APM Server to the
[`host.hostname` ECS field](https://www.elastic.co/guide/en/ecs/current/ecs-host.html#field-host-hostname), which should
typically contain what the `hostname` command returns on the host machine. However, since we rely on this field for
our integration with beats data, we should attempt to follow a similar logic to the `os.Hostname()` Go API, which beats
relies one. While `os.Hostname()` contains some complex OS-specific logic to cover all sorts of edge cases, our
algorithm should be simpler. It relies on the execution of external commands with a fallback to standard environment
variables. Agents SHOULD implement this hostname discovery algorithm wherever possible:
The hostname value(s) reported by the agent are mapped by APM Server to the ECS
[`host.hostname`](https://www.elastic.co/guide/en/ecs/current/ecs-host.html#field-host-hostname) and
[`host.name`](https://www.elastic.co/guide/en/ecs/current/ecs-host.html#field-host-name) fields.

Agents SHOULD return the lower-cased FQDN whenever possible, which might require a DNS query.

Agents SHOULD implement this hostname discovery algorithm wherever possible:
```
var hostname;
if os == windows
hostname = exec "cmd /c hostname" // or any equivalent *
// https://stackoverflow.com/questions/12268885/powershell-get-fqdn-hostname
// https://learn.microsoft.com/en-us/dotnet/api/system.net.dns.gethostentry
hostname = exec "powershell.exe [System.Net.Dns]::GetHostEntry($env:computerName).HostName" // or any equivalent *
trentm marked this conversation as resolved.
Show resolved Hide resolved
if (hostname == null || hostname.length == 0)
hostname = exec "cmd.exe /c hostname" // or any equivalent *
if (hostname == null || hostname.length == 0)
hostname = env.get("COMPUTERNAME")
else
hostname = exec "uname -n" // or any equivalent *
if (hostname == null || hostname.length == 0)
hostname = exec "hostname" // or any equivalent *
hostname = exec "hostname -f" // or any equivalent *
Copy link
Contributor

@stevejgordon stevejgordon Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SylvainJuge In my testing while implementing for .NET, it seems that this command may return with a period suffix (e.g. DESKTOP-NCGCCT0.) or even the suffix .localdomain. Do we send as-is, or should we consider trimming in either case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevejgordon Where did you see an example like DESKTOP-NCGCCT0.? Is that an OS other than Windows?

or even the suffix .localdomain

I'm not sure if it is equivalent, but on macOS the domain is commonly ".local", e.g. on my laptop:

% hostname -f
pink.local

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm testing using WSL2 on Windows with Ubuntu. On my Windows laptop, where no domain is set, it reports the period suffix when I run hostname -f from Ubuntu (using WSL2). I also set up a Windows 11 VM to test adding the DNS suffix and again added WSL Ubuntu to test the code and picked up the FQDN, which it did. Once I removed the DNS suffix, .localdomain started appearing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For .NET, as the primary mechanism, I'm planning to use the Dns.GetHostEntry(string.Empty).HostName API, which is cross-platform aware and seems to potentially include the suffix, at least when testing on Ubuntu via WSL2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here that's clearly a case where using the API seems to be more consistent than the equivalent with hostname.
Does it also returns a consistent value when invoked outside of WSL ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@stevejgordon stevejgordon Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trentm The .NET API eventually seems to call into the native Windows Sockets 2, Winsock API gethostname function and on Linux the gethostname system call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if I understand this correctly, you are suggesting to just remove the extra . that is appended when calling this API through WSL ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SylvainJuge Originally, yes, I thought perhaps we should. However, as it's consistent with hostname -f and therefore, presumably what other agents on the same host will send, perhaps best to leave it unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevejgordon do you think we need to add a clarification about this in the spec or could we leave it as-is ?

if (hostname == null || hostname.length == 0)
hostname = env.get("HOSTNAME")
if (hostname == null || hostname.length == 0)
hostname = env.get("HOST")
Comment on lines 46 to 49
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a final nit, I am not implementing this fallback of using the HOSTNAME and HOST envvars in the Node.js APM agent.

I couldn't quickly find any references for the HOST envvar being set by anything authoritative. The HOSTNAME var is set by Bash, which doesn't strike me as an authority we should use. FWIW, this SO post points out that HOSTNAME is not an envvar defined by POSIX (https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html).

This earlier spec comment was the only discussion around having the HOSTNAME and HOST fallbacks that I could see. I don't know if it originated from what Beats were doing. My understanding is that now beats are using an algorithm that does not use these vars (https://github.com/elastic/beats/blob/c7c18b37eb34f0665f6d9a3b2c7ea4543018ba6a/libbeat/cmd/instance/beat.go#L789-L803 https://github.com/elastic/go-sysinfo/blob/9ea2eba5301c77b511c0475d7137ea3437a55a9a/providers/shared/fqdn.go#L29-L43)

Regarding current usage, it looks to me like only the Java and .NET agents are currently looking at the HOSTNAME and HOST envvars. (FWIW. the .NET agent looks at all of HOSTNAME, HOST, and COMPUTERNAME on all platforms). None of the other agents that I saw are using those envvars.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and feedback @trentm !

I agree with you that those variables might definitely not standard nor available to the runtime as they are usually set by shell and are unlikely to be set on all OSes and distributions. I am assuming (slight speculation on my end) the intent to have them was to provide a "fallback just in case", and that they might not be available most of the time.

However, I don't see any major downside to having them in the spec and using them as fallback, even if it's rarely used in practice. So for now I would suggest we keep them as-is and revisit this decision later if that proves to be an issue or a source of extra complexity.


if hostname != null
hostname = hostname.trim() // see details below **
hostname = hostname.toLowerCase().trim() // see details below **
```
`*` this algorithm is using external commands in order to be OS-specific and language-independent, however these
may be replaced with language-specific APIs that provide the equivalent result. The main consideration when choosing
what to use is to avoid hostname discovery that relies on DNS lookup.
may be replaced with language-specific APIs that provide the equivalent result.

`**` in this case, `trim()` refers to the removal of all leading and trailing characters of which codepoint is less-than
or equal to `U+0020` (space).

Agents MAY use alternative approaches, but those need to generally conform to the basic concept. Failing to discover the
proper hostname may cause failure in correlation between APM traces and data reported by other clients (e.g.
Metricbeat). For example, if the agent uses an API that produces the FQDN, this value is likely to mismatch hostname
reported by other clients.
or equal to `U+0020` (space), the `toLowerCase()` refers to the replacement of characters in `A-Z` with their `a-z` equivalents.

In addition to auto-discovery of the hostname, agents SHOULD also expose the `ELASTIC_APM_HOSTNAME` config option that
can be used as a manual fallback.
SylvainJuge marked this conversation as resolved.
Show resolved Hide resolved
Expand Down Expand Up @@ -97,7 +93,7 @@ On Linux, the container ID and some of the Kubernetes metadata can be extracted
If there is a match to either expression, the capturing group contains the pod ID. We then unescape underscores
(`_`) to hyphens (`-`) in the pod UID.
If we match a pod UID then we record the hostname as the pod name since, by default, Kubernetes will set the
hostname to the pod name. Finally, we record the basename as the container ID without any further checks.
_short_ hostname (not FQDN) to the pod name. Finally, we record the basename as the container ID without any further checks.

4. If we did not match a Kubernetes pod UID above, then we check if the basename matches one of the following regular
expressions:
Expand Down