Autodiscover provider for Nomad #14954

jorgelbg · 2019-12-05T14:07:13Z

At trivago we run an internal cloud using Nomad from Hashicorp. Our logging solution is based on ELK and we use Filebeat to ship the logs from our client nodes into Kafka where it is later on ingested into Elasticsearch using Logstash. Previously we used the and input looking for new jobs in a defined path, but the logs lacked a lot of context/metadata from the Job definition/allocation.

This PR adds a new discover module (architecture based on the Kubernetes module). With this new provider, it is possible to start new harvesters by looking at the jobs allocated on each node. We currently run filebeat as a system job on each node and each filebeat instance is responsible for enriching and shipping the local logs.

Example of the configuration for the new provider:

filebeat.autodiscover:
  providers:
    - type: nomad
      host: {{ env "node.unique.name" }}
      hints.enabled: true
      hints.default_config:
        type: log
        paths:
          - /appdata/nomad/alloc/${data.meta.uuid}/alloc/logs/*stderr.[0-9]*
          - /appdata/nomad/alloc/${data.meta.uuid}/alloc/logs/*stdout.[0-9]*

By using the autodiscover module it is possible to define custom processors using the meta stanza on the Nomad job (similar to how it is defined using labels on Kubernetes). For instance:

task "nginx-web" {
    driver = "docker"

    meta {
    task-key = "custom-meta"
    "co.elastic.logs/processors.dissect.tokenizer" = "%{ip} - %{user} [%{local_time}] \"%{request}\" %{status} %{bytes_sent} \"%{referer}\" \"%{user_agent}\""
    }
}

This example defines a custom dissect tokenizer for the logs of this specific task that adds the dissect field with a content similar to:

"dissect": {
    "bytes_sent": "7231",
    "referer": "http://nginx-web.prod.trivago.com/",
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36",
    "ip": "10.2.10.138",
    "user": "-",
    "local_time": "15/Nov/2019:09:04:04 +0000",
    "request": "GET / HTTP/1.1",
    "status": "200"
}

By default the following fields are added from the Nomad job/allocation:

job
namespace
status
type (job type: system/service/batch)
task.* (information about the task and custom metadata defined in the job/group/task using the meta stanza)
datacenters
region

The PR also includes an add_nomad_metadata processor that matches events to specific allocations and adds the metadata.

We've been running this in our production clusters for a few weeks now.

TODO:

Metricbeat support for extracting stats from the Nomad allocation
Documentation (it is a WIP and I will add a new commit with the documentation)
CHANGELOG
Fields reference

How to test locally

Start a local development agent (nomad agent -dev).

Start filebeat with a configuration like this one:

filebeat.autodiscover:
  providers:
  - type: nomad
    templates:
      - config:
          - type: log
            paths:
              - /tmp/NomadClient*/${data.nomad.allocation.id}/alloc/logs/*stderr.[0-9]*
              - /tmp/NomadClient*/${data.nomad.allocation.id}/alloc/logs/*stdout.[0-9]*

Run some service in nomad, for example:

job "consul" {
  datacenters = ["dc1"]

  group "server" {
    task "consul-dev" {
      driver = "raw_exec"

      config {
        command = "consul"
        args = [
          "agent", "-dev",
        ]
      }
      artifact {
        source = "https://releases.hashicorp.com/consul/1.9.0/consul_1.9.0_linux_amd64.zip"
      }
    }
  }
}

Check that logs are collected for this service and they include the nomad metadata.

Repeat with a configuration like this one for hints-based autodiscover:

filebeat.autodiscover:
  providers:
  - type: nomad
    hints.enabled: true
    hints.default_config:
      type: log
      paths:
        - /tmp/NomadClient*/${data.nomad.allocation.id}/alloc/logs/*stderr.[0-9]*
        - /tmp/NomadClient*/${data.nomad.allocation.id}/alloc/logs/*stdout.[0-9]*

Add a test for the basic hint features

…ag (event type)

Add tests for the emition of events

…autodiscovery-nomad

…llocations Add local constants to track the possible status of the allocations

On older nomad versions (0.8.4) the `NodeName` attribute of the allocation is empty. This means that sometimes we cannot assign a proper `host` to the event. As a workaround we use the `NodeID` to get the the name from the actual client node.

This is a workaround for Nomad v0.8 that doesn't provide the NodeName directly in the allocation object. We use the NodeID to fetch it from the API.

Remove debug statement

- WIP emit only one task metadata event. - Rename the matchers/indexers of the add_nomad_metadata processor to match the Nomad lingo.

- Rename `meta.meta` to `meta.tasks` and fix the tests. - Add the main import to the nomad provider in the cmd tool.

WIP patch for the unchanged allocations and avoids triggering new harvesters for those allocations that were previously discovered.

- Rename `uuid` field to `alloc_id`. - Use WatchOptions.RefreshInterval (SyncInterval on the config) for the sync interval of the watcher

jsoriano · 2021-01-05T13:53:43Z

jenkins run the tests please

jsoriano

Thanks @jorgelbg!

jorgelbg · 2021-01-05T13:58:23Z

@jsoriano I cherry-picked the changes from your branch, but I'm ok if we merge from any side. Thanks for taking care of fixing those issues (especially the Windows test). Changes look great! I was planning on jumping into this after the holidays but glad that you were faster!

jsoriano · 2021-01-06T13:25:01Z

@jorgelbg could you please update the branch with master? Failing tests were fixed yesterday.

jsoriano · 2021-01-06T14:17:55Z

jenkins run the tests please

jsoriano · 2021-01-07T11:26:52Z

Merged, thanks a lot @jorgelbg!

Initial features to support logs collection from applications deployed in Nomad. Add a new `nomad` autodiscover provider (based on the Kubernetes provider). With this new provider, it is possible to start new harvesters by looking at the jobs allocated on each node. With this, filebeat can be run as a system job on each node and each filebeat instance is responsible for enriching and shipping the local logs. This autodiscover provider supports hints-based autodiscover. Add a new `add_nomad_metadata` processor that matches events to specific allocations and adds the metadata. Co-authored-by: Jaime Soriano Pastor <[email protected]> (cherry picked from commit 24397d8)

sorantis · 2021-01-07T12:06:50Z

Thank you all for working on this!

Initial features to support logs collection from applications deployed in Nomad. Add a new `nomad` autodiscover provider (based on the Kubernetes provider). With this new provider, it is possible to start new harvesters by looking at the jobs allocated on each node. With this, filebeat can be run as a system job on each node and each filebeat instance is responsible for enriching and shipping the local logs. This autodiscover provider supports hints-based autodiscover. Add a new `add_nomad_metadata` processor that matches events to specific allocations and adds the metadata. (cherry picked from commit 24397d8) Co-authored-by: Jaime Soriano Pastor <[email protected]> Co-authored-by: Jorge Luis Betancourt <[email protected]>

jorgelbg and others added 30 commits August 1, 2019 17:09

Metadata extractor from Nomad

86996ff

Add hint generation from the nomad event's meta

2d4d434

Add a test for the basic hint features

Autodetect the hostname to filter Nomad events

164bf0e

Use GCP region as datacenter

a03f588

Remove duplicated task and job keys from the event. Use the proper fl…

023765c

…ag (event type)

Add host from the allocation to the event.

fa7df7e

Add tests for the emition of events

Add host from the allocation to the event.

83884c7

Add tests for the emition of events

Merge branch 'autodiscovery-nomad' of github.com:jorgelbg/beats into …

6a5e46c

…autodiscovery-nomad

Fire the proper events depending on the status & update time of the a…

281a384

…llocations Add local constants to track the possible status of the allocations

Rename Flag on the test to a closer name to match the nomad nomenclature

60c65ef

Add support for older nomad versions

145a92f

On older nomad versions (0.8.4) the `NodeName` attribute of the allocation is empty. This means that sometimes we cannot assign a proper `host` to the event. As a workaround we use the `NodeID` to get the the name from the actual client node.

Use the internal watcher logger

37b509e

WIP pass the client to the metadata generator

372db05

This is a workaround for Nomad v0.8 that doesn't provide the NodeName directly in the allocation object. We use the NodeID to fetch it from the API.

Add cache for the NodeName data

afccc97

Patch the alloc.NodeName before emitting allocation events

a30057f

Remove debug statement

Add filebeat allocation ID extractor from the log path

97ea28f

Fix usage of the of the WaitIndex on the allocation watcher

05f7e3f

Add add_nomad_metadata processor to libbeat

af3dfdc

Add the nomad API package to the vendors

bf59c3a

Glue the add_nomad_metadata processor to the rest of filebeat

9ec5f00

- WIP emit only one task metadata event. - Rename the matchers/indexers of the add_nomad_metadata processor to match the Nomad lingo.

Log only the object IDs in the INFO level

0b6e9d4

- Rename `meta.meta` to `meta.tasks` and fix the tests. - Add the main import to the nomad provider in the cmd tool.

Update NOTICE.txt

f1b6283

Fix issue with detecting updated allocations.

caae329

WIP patch for the unchanged allocations and avoids triggering new harvesters for those allocations that were previously discovered.

Split ResourceMetadata and GroupMetadata into two different functions

c8b7da7

- Rename `uuid` field to `alloc_id`. - Use WatchOptions.RefreshInterval (SyncInterval on the config) for the sync interval of the watcher

Rename some kubernetes ocurrences from the nomad autodiscover beat

a647a0a

Add AllowStale option to the Nomad configuration

f398d51

Remove config.Prefix fields from the final event

148ba9f

Add status field to the allocation metadata

4cca342

Allow stale responses from the Nomad client by default

f0c63e0

Fix the value of the status field

1d7cc78

jsoriano and others added 8 commits January 5, 2021 13:03

Add changelog entry

4a141e3

Use singular in datacenters field name

2115077

Reorganize allocation and job fields

6077aed

Add fields definition

bf2c03d

Fix test in Windows

a1dc3f1

Merge remote-tracking branch 'origin/master' into autodiscover-nomad

e182038

Merge remote-tracking branch 'upstream/master' into autodiscovery-nomad

221622b

Remove duplicated entry in changelog

a18e6bf

jsoriano added v7.12.0 needs_backport PR is waiting to be backported to other branches. labels Jan 5, 2021

jsoriano approved these changes Jan 5, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/master' into autodiscovery-nomad

ad49e39

jsoriano merged commit 24397d8 into elastic:master Jan 7, 2021

zube bot added [zube]: Done and removed [zube]: In Review labels Jan 7, 2021

jsoriano mentioned this pull request Jan 7, 2021

Cherry-pick #14954 to 7.x: Autodiscover provider for Nomad #23392

Merged

4 tasks

jsoriano added test-plan Add this PR to be manual test plan and removed needs_backport PR is waiting to be backported to other branches. labels Jan 7, 2021

andresrc added the test-plan-added This PR has been added to the test plan label Feb 15, 2021

zube bot removed the [zube]: Done label Apr 7, 2021

This was referenced Jun 28, 2021

Nomad autodiscover event missing task fields #26538

Closed

Logging code cleanup related to Nomad auto-discovery #26498

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autodiscover provider for Nomad #14954

Autodiscover provider for Nomad #14954

jorgelbg commented Dec 5, 2019 •

edited by jsoriano

Loading

jsoriano commented Jan 5, 2021

jsoriano left a comment

jorgelbg commented Jan 5, 2021

jsoriano commented Jan 6, 2021

jsoriano commented Jan 6, 2021

jsoriano commented Jan 7, 2021

sorantis commented Jan 7, 2021

Autodiscover provider for Nomad #14954

Autodiscover provider for Nomad #14954

Conversation

jorgelbg commented Dec 5, 2019 • edited by jsoriano Loading

TODO:

How to test locally

jsoriano commented Jan 5, 2021

jsoriano left a comment

Choose a reason for hiding this comment

jorgelbg commented Jan 5, 2021

jsoriano commented Jan 6, 2021

jsoriano commented Jan 6, 2021

jsoriano commented Jan 7, 2021

sorantis commented Jan 7, 2021

jorgelbg commented Dec 5, 2019 •

edited by jsoriano

Loading