Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Agent] Add agent standalone manifests for system module & Pod's log collection #23938

Merged
merged 12 commits into from
Feb 22, 2021

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Feb 9, 2021

What does this PR do?

This PR adds k8s manifest for running Elastic Agent in standalone mode with:

  1. system integration enabled by default. This one deploys Agent as Daemonset Pods on all k8s nodes. It stands as the equivalent of Metricbeat's system module on k8s.

  2. Pod's log collection enabled using dynamic inputs in combination with k8s provider.

[DONE:] It will need to be combined with #23679 most probably so as to deliver one single manifest to end users, but for now I'm keeping these 2 separately.

How to test this PR locally

  1. Create a local k8s cluster:
# three node (two workers) cluster config
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
- role: worker
- role: worker

kind create cluster --config kind-mutly.yaml
2. Set a proper ES host inside manifest and deploy Agent: kubectl apply -f elastic-agent-standalone-kubernetes.yml
3. Verify that all data streams ship data:

  • system.cpu
  • system.process
  • system.core
  • system.memory
  • system.diskio
  • system.filesystem
  • system.fsstat
  • system.load
  • system.network
  • system.process_summary 
  • system.socket_summary 
  1. Make sure that Metrics UI properly display hosts' utilisation.
  2. Verify that Pod's logs are being collected under generic dataset and also enriched by k8s metadata.
  3. Verify that k8s integration is not affected: check that data are being shipped for kubernetes.apiserver, kubernetes.state_pod, kubernetes.pod, kubernetes.proxy, kubernetes.scheduler, kubernetes.controllermanager.
  4. Verify that Metrics UI works properly by checking from node scope metrics to Pod's logs and metrics.

Related issues

Logs

Sample event:

{
  "_index": ".ds-metrics-system.process-default-2021.02.09-000001",
  "_type": "_doc",
  "_id": "wFSEh3cBJ6OP2vQnMKQa",
  "_version": 1,
  "_score": null,
  "_source": {
    "process": {
      "args": [
        "elastic-agent",
        "run",
        "-c",
        "/etc/agent.yml",
        "-e",
        "-d",
        "*"
      ],
      "memory": {
        "pct": 0.0024
      },
      "pgid": 8,
      "name": "elastic-agent",
      "cpu": {
        "pct": 0.0006,
        "start_time": "2021-02-09T15:56:33.000Z"
      },
      "pid": 8,
      "working_directory": "/usr/share/elastic-agent",
      "state": "sleeping",
      "executable": "/usr/share/elastic-agent/data/elastic-agent-06c53e/elastic-agent",
      "command_line": "elastic-agent run -c /etc/agent.yml -e -d *",
      "ppid": 1
    },
    "agent": {
      "hostname": "kind-control-plane",
      "name": "kind-control-plane",
      "id": "b8a7e05b-cde3-40be-ba18-6a54b230d6b0",
      "ephemeral_id": "f1effdd1-94f6-472f-a4b9-a8ec7b746794",
      "type": "metricbeat",
      "version": "7.12.0"
    },
    "@timestamp": "2021-02-09T15:58:40.859Z",
    "system": {
      "process": {
        "cmdline": "elastic-agent run -c /etc/agent.yml -e -d *",
        "memory": {
          "rss": {
            "pct": 0.0024,
            "bytes": 43618304
          },
          "size": 1491582976,
          "share": 29769728
        },
        "cpu": {
          "start_time": "2021-02-09T15:56:33.000Z",
          "total": {
            "pct": 0.005,
            "value": 5860,
            "norm": {
              "pct": 0.0006
            }
          }
        },
        "state": "sleeping",
        "fd": {
          "limit": {
            "hard": 1048576,
            "soft": 1048576
          },
          "open": 16
        }
      }
    },
    "ecs": {
      "version": "1.7.0"
    },
    "service": {
      "type": "system"
    },
    "data_stream": {
      "namespace": "default",
      "type": "metrics",
      "dataset": "system.process"
    },
    "host": {
      "hostname": "kind-control-plane",
      "os": {
        "kernel": "4.9.184-linuxkit",
        "codename": "Core",
        "name": "CentOS Linux",
        "family": "redhat",
        "version": "7 (Core)",
        "platform": "centos"
      },
      "containerized": true,
      "ip": [
        "10.244.0.1",
        "10.244.0.1",
        "10.244.0.1",
        "172.18.0.4",
        "fc00:f853:ccd:e793::4",
        "fe80::42:acff:fe12:4"
      ],
      "name": "kind-control-plane",
      "id": "5139dfb41717ff9b7cdaf89657e1c0c7",
      "mac": [
        "12:f1:c7:a9:3f:a8",
        "36:fa:76:d1:f0:50",
        "2a:dd:7d:08:80:a2",
        "02:42:ac:12:00:04"
      ],
      "architecture": "x86_64"
    },
    "elastic_agent": {
      "id": "887f7040-4fc3-44ec-8391-a3d7b1af5f7b",
      "version": "7.12.0",
      "snapshot": true
    },
    "metricset": {
      "period": 10000,
      "name": "process"
    },
    "event": {
      "duration": 2292246,
      "module": "system",
      "dataset": "system.process"
    },
    "user": {
      "name": "root"
    }
  },
  "fields": {
    "process.cpu.start_time": [
      "2021-02-09T15:56:33.000Z"
    ],
    "@timestamp": [
      "2021-02-09T15:58:40.859Z"
    ],
    "system.process.cpu.start_time": [
      "2021-02-09T15:56:33.000Z"
    ]
  },
  "sort": [
    1612886320859
  ]
}

cc: @blakerouse @david-kow @fearful-symmetry

@ChrsMark ChrsMark added Team:Integrations Label for the Integrations team Team:Elastic-Agent Label for the Agent team labels Feb 9, 2021
@ChrsMark ChrsMark self-assigned this Feb 9, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@elasticmachine
Copy link
Collaborator

elasticmachine commented Feb 9, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #23938 updated

  • Start Time: 2021-02-22T10:26:06.955+0000

  • Duration: 75 min 50 sec

  • Commit: 292c360

Trends 🧪

Image of Build Times

❕ Flaky test report

No test was executed to be analysed.

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Feb 9, 2021
@botelastic
Copy link

botelastic bot commented Feb 9, 2021

This pull request doesn't have a Team:<team> label.

@ChrsMark ChrsMark mentioned this pull request Feb 16, 2021
22 tasks
@ph
Copy link
Contributor

ph commented Feb 16, 2021

@ruflin @michalpristas @blakerouse Do we need to add ids for inputs or stream?

@ChrsMark
Copy link
Member Author

Adding the section to collect logs from Pod's using k8s provider works too. We can add it here or we can add it in separate follow-up PR.

Sample event:

{
  "_index": ".ds-logs-generic-default-2021.02.17-000001",
  "_type": "_doc",
  "_id": "BayDsHcB281QBbW7tkDy",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2021-02-17T15:02:37.006Z",
    "elastic_agent": {
      "snapshot": true,
      "version": "7.12.0",
      "id": "0edb7f45-68dc-48c0-af31-72b512c1a075"
    },
    "ecs": {
      "version": "1.6.0"
    },
    "log": {
      "offset": 111721,
      "file": {
        "path": "/var/log/containers/elastic-agent-wnp9g_kube-system_elastic-agent-62f55fefad8c999c37f82a03894cd604cd75aa8df1f7ec0e0926386b467f8b55.log"
      }
    },
    "message": "{\"log\":\"2021-02-17T15:02:36.050Z\\u0009INFO\\u0009operation/operator.go:245\\u0009operation 'operation-start' skipped for filebeat.7.12.0-SNAPSHOT\\n\",\"stream\":\"stderr\",\"time\":\"2021-02-17T15:02:36.050439402Z\"}",
    "input": {
      "type": "log"
    },
    "kubernetes": {
      "namespace": "kube-system",
      "pod": {
        "name": "elastic-agent-wnp9g",
        "uid": "82ba7656-6bfd-47ab-aff4-b607f1919d68",
        "ip": "10.128.0.17",
        "labels": {
          "controller-revision-hash": "665cbc7d9c",
          "pod-template-generation": "1",
          "app": "elastic-agent"
        }
      },
      "container": {
        "image": "docker.elastic.co/beats/elastic-agent:7.12.0-SNAPSHOT",
        "name": "elastic-agent",
        "runtime": "docker",
        "id": "62f55fefad8c999c37f82a03894cd604cd75aa8df1f7ec0e0926386b467f8b55"
      }
    },
    "data_stream": {
      "type": "logs",
      "dataset": "generic",
      "namespace": "default"
    },
    "event": {
      "dataset": "generic"
    },
    "host": {
      "mac": [
        "42:01:0a:80:00:11",
        "02:42:37:90:a6:0b",
        "b6:29:89:3b:b4:a4",
        "36:df:52:c9:18:89",
        "5e:16:1b:92:74:6a"
      ],
      "hostname": "gke-chrismark-test-agent-default-pool-c44aff9a-cxrg",
      "architecture": "x86_64",
      "name": "gke-chrismark-test-agent-default-pool-c44aff9a-cxrg",
      "os": {
        "codename": "Core",
        "platform": "centos",
        "version": "7 (Core)",
        "family": "redhat",
        "name": "CentOS Linux",
        "kernel": "4.19.150+"
      },
      "id": "5fdb5cc51a59a640a350f8f0b0c762f5",
      "containerized": true,
      "ip": [
        "10.128.0.17",
        "fe80::4001:aff:fe80:11",
        "169.254.123.1",
        "10.40.0.1",
        "fe80::b429:89ff:fe3b:b4a4",
        "fe80::34df:52ff:fec9:1889",
        "fe80::5c16:1bff:fe92:746a"
      ]
    },
    "cloud": {
      "account": {
        "id": "elastic-observability"
      },
      "provider": "gcp",
      "instance": {
        "id": "879949892028547785",
        "name": "gke-chrismark-test-agent-default-pool-c44aff9a-cxrg"
      },
      "machine": {
        "type": "e2-medium"
      },
      "availability_zone": "us-central1-c",
      "project": {
        "id": "elastic-observability"
      }
    },
    "agent": {
      "name": "gke-chrismark-test-agent-default-pool-c44aff9a-cxrg",
      "type": "filebeat",
      "version": "7.12.0",
      "hostname": "gke-chrismark-test-agent-default-pool-c44aff9a-cxrg",
      "ephemeral_id": "33e6ffd1-7152-42e5-9dc2-55d52b50e7b3",
      "id": "61b439d7-e5f2-497f-a194-87969434799b"
    }
  },
  "fields": {
    "@timestamp": [
      "2021-02-17T15:02:37.006Z"
    ]
  },
  "sort": [
    1613574157006
  ]
}

Signed-off-by: chrismark <[email protected]>
@ChrsMark
Copy link
Member Author

ChrsMark commented Feb 17, 2021

So at this state the manifest file supports the following:

  1. Runs kubernetes integration (from previous PR)
  2. Runs system integration (the same metricsets that Metricbeat was running in the past with )
  3. Collects Pod's logs using k8s provider and dynamic inputs

@blakerouse @jsoriano @david-kow feel free to review when you have the time

Signed-off-by: chrismark <[email protected]>
Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for splitting it in multiple files!

Signed-off-by: chrismark <[email protected]>
@ChrsMark ChrsMark changed the title [Agent] Add agent standalone manifests for system module [Agent] Add agent standalone manifests for system module & Pod's log collection Feb 18, 2021
args: [
"-c", "/etc/agent.yml",
"-e", "-d", "composable.providers.kubernetes",
"-e", "-d", "*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have the debug selector for all on by default? That seems like it would produce probably more than it should.

Maybe remove it with a comment on how to add it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@@ -87,6 +111,137 @@ data:
node: ${NODE_NAME}
scope: node
inputs:
- id: 4ae27079-6cd4-4ab7-a459-abbae74ffc44
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to remove the id from everywhere. You really do not need those, being that those are normally generated by Fleet.

Elastic Agent will work without id on the inputs and streams.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 thanks for clarifying

image: docker.elastic.co/beats/elastic-agent:%VERSION%
args: [
"-c", "/etc/agent.yml",
"-e", "-d", "composable.providers.kubernetes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has the debug selector specific to the kubernetes provider. That might be acceptable to always have on. It should be consistent across the files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we want any debug logging enabled by default? What do you think about leaving it commented out?

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I wonder if we want any debug logging enabled by default.

image: docker.elastic.co/beats/elastic-agent:%VERSION%
args: [
"-c", "/etc/agent.yml",
"-e", "-d", "composable.providers.kubernetes",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we want any debug logging enabled by default? What do you think about leaving it commented out?

Copy link
Contributor

@david-kow david-kow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, we might reuse some of this for ECK examples :) Added some comments/questions.

@ChrsMark
Copy link
Member Author

Looks good, but I wonder if we want any debug logging enabled by default.

Ok, makes sense, I will remove it for now.

Signed-off-by: chrismark <[email protected]>
@mukeshelastic
Copy link

Phenomenal progress in getting standalone agent mode working to get logs and metrics for system and K8s control plane. Thanks @ChrsMark for getting us here so quickly in last few weeks, Kudos!

Like you said in the description, we will need a single standalone manifest that collects metrics and logs for both system and k8s but looking at this standalone manifest and that doesn't have the system integration yet. Is that something we are planning to get next week?

Once we have that single standalone manifest, I am assuming we can just fit this standalone manifest right in the "add agent in standalone mode" in Fleet UI with K8s integration config in the standalone manifest filled with kubernetes section from agent policy.
Screen Shot 2021-02-19 at 9 14 23 PM

So users can just copy this standalone manifest from Fleet UI, add ES creds, run it on K8s and bingo, they have the system and K8s observability. Is that right?

This will also work with ECK agent CRD I assume where users can just insert the standalone manifest in appropriate section in agent config @shubhaat fyi

@ChrsMark
Copy link
Member Author

ChrsMark commented Feb 22, 2021

Like you said in the description, we will need a single standalone manifest that collects metrics and logs for both system and k8s but looking at this standalone manifest and that doesn't have the system integration yet. Is that something we are planning to get next week?

Hey! This PR adds system integration and logs' collection from Pods. You can see the full manifest at https://github.com/elastic/beats/pull/23938/files.

Once we have that single standalone manifest, I am assuming we can just fit this standalone manifest right in the "add agent in standalone mode" in Fleet UI with K8s integration config in the standalone manifest filled with kubernetes section from agent policy.

So users can just copy this standalone manifest from Fleet UI, add ES creds, run it on K8s and bingo, they have the system and K8s observability. Is that right?

In the past we had been sharing this kind of manifests(https://github.com/elastic/beats/blob/master/deploy/kubernetes/metricbeat-kubernetes.yaml) through GH and guiding our users through https://www.elastic.co/guide/en/beats/metricbeat/current/running-on-kubernetes.html. I expect we will have sth similar here, not sure if if this should be added in Fleet too :).

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome to see conditions just working! Nicely done.

@ChrsMark ChrsMark merged commit 7f92834 into elastic:master Feb 22, 2021
v1v added a commit to v1v/beats that referenced this pull request Feb 22, 2021
* upstream/master:
  [Elastic Agent] Fix docker entrypoint for elastic-agent. (elastic#24155)
  [PACKAGING] Push docker images with the architecture in the version (elastic#24121)
  [Agent] Add agent standalone manifests for system module & Pod's log collection (elastic#23938)
  indicator type url is in upper case (elastic#24152)
  [Filebeat] Document netflow internal_networks and set default (elastic#24110)
  [Filebeat] Adding fixes to the TI module (elastic#24133)
  [Enhancement] Add RotateOnStartup feature flag for file output (elastic#19347)
  [Ingest Manager] Fix: Successfully installed and enrolled agent running standalone (elastic#24128)
  Set Elastic licence type for APM server Beats update job (elastic#24122)
  Add logrotation section on Running Filebeat on k8s (elastic#24120)
  [CI] Run if manual UI (elastic#24116)
  [CI] enable x-pack/heartbeat in the CI (elastic#23873)
v1v added a commit to v1v/beats that referenced this pull request Feb 23, 2021
…dows-7

* upstream/master:
  Remove OSS reference for kibana and elasticsearch (elastic#24164)
  Skip flaky TestActions on MacOSx (elastic#23966)
  [Filebeat][AWS] Fix vpcflow pipeline exception: Cannot invoke "Object.getClass()" because "receiver" is null (elastic#24167)
  [Elastic Agent] Fix docker entrypoint for elastic-agent. (elastic#24155)
  [PACKAGING] Push docker images with the architecture in the version (elastic#24121)
  [Agent] Add agent standalone manifests for system module & Pod's log collection (elastic#23938)
  indicator type url is in upper case (elastic#24152)
  [Filebeat] Document netflow internal_networks and set default (elastic#24110)
  [Filebeat] Adding fixes to the TI module (elastic#24133)
  [Enhancement] Add RotateOnStartup feature flag for file output (elastic#19347)
  [Ingest Manager] Fix: Successfully installed and enrolled agent running standalone (elastic#24128)
  Set Elastic licence type for APM server Beats update job (elastic#24122)
  Add logrotation section on Running Filebeat on k8s (elastic#24120)
  [CI] Run if manual UI (elastic#24116)
  [CI] enable x-pack/heartbeat in the CI (elastic#23873)
  chore: comment out the E2E (elastic#24109)
  chore: add-backport-next (elastic#24098)
  Adjust the position of the architecture name in Dockerlogbeat tarball (elastic#24095)
  Update dependencies for M1 support in System (elastic#24019)
v1v added a commit to v1v/beats that referenced this pull request Feb 23, 2021
…-arm

* upstream/master: (24 commits)
  Add example input autodsicover config (elastic#24157)
  Empty configuration options generate `<no value>` string for azure-eventhub input (elastic#24156)
  Remove OSS reference for kibana and elasticsearch (elastic#24164)
  Skip flaky TestActions on MacOSx (elastic#23966)
  [Filebeat][AWS] Fix vpcflow pipeline exception: Cannot invoke "Object.getClass()" because "receiver" is null (elastic#24167)
  [Elastic Agent] Fix docker entrypoint for elastic-agent. (elastic#24155)
  [PACKAGING] Push docker images with the architecture in the version (elastic#24121)
  [Agent] Add agent standalone manifests for system module & Pod's log collection (elastic#23938)
  indicator type url is in upper case (elastic#24152)
  [Filebeat] Document netflow internal_networks and set default (elastic#24110)
  [Filebeat] Adding fixes to the TI module (elastic#24133)
  [Enhancement] Add RotateOnStartup feature flag for file output (elastic#19347)
  [Ingest Manager] Fix: Successfully installed and enrolled agent running standalone (elastic#24128)
  Set Elastic licence type for APM server Beats update job (elastic#24122)
  Add logrotation section on Running Filebeat on k8s (elastic#24120)
  [CI] Run if manual UI (elastic#24116)
  [CI] enable x-pack/heartbeat in the CI (elastic#23873)
  chore: comment out the E2E (elastic#24109)
  chore: add-backport-next (elastic#24098)
  Adjust the position of the architecture name in Dockerlogbeat tarball (elastic#24095)
  ...
@mukeshelastic
Copy link

@ChrsMark I may be missing something here but when I look at standalone config in master I see only one logfile input which collects the logs from var/log/containers but what about auth and syslog datasets? Here is what I see on the /var/log directory on single node K8s cluster. And I think we should be providing the auth.log and syslog similar to how our system integration logfile input collects.

/var/log$ ls alternatives.log apt auth.log btmp containers daemon.log debug dpkg.log faillog kern.log lastlog messages ntpstats pods syslog syslog.1 syslog.2.gz

@ChrsMark
Copy link
Member Author

@mukeshelastic system logs will be collected too after #24185.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants