[Fleet] Agents are intermittently showing as off-line in Kibana Fleet #21025

EricDavisX · 2020-09-08T19:00:09Z

[Fleet] - Agents are reported as going off-line as seen in Kibana Fleet, I'm seeing lots of timestamps in the Agent Activity log with the same time. Is it somehow expected? Seems strange. Some screenshots are below...

found on latest deployed 8.0 snapshot. maybe its fixed already, i know that snapshots had been broken for a few days. I'm going to drop a quick ticket in with repro steps so we don't spend time if its already ok.

tested on:
https://kibana.endpoint.elastic.dev/app/ingestManager#/fleet/agents/946f0178-31e6-4e2d-be4b-556320bb55e0

deployed nightly with new / fresh install of latest of master.

as of now, its running code from Sept 3 (now is Sept 8th).
edavis-mbp:kibana_elastic edavis$ git show -s 60986d4f8202016c98409c2926ccf29d9d2ee7e0
commit 60986d4f8202016c98409c2926ccf29d9d2ee7e0
Author: Yuliia Naumenko [email protected]
Date: Thu Sep 3 13:07:23 2020 -0700

maybe related to e2e test cited bug (logged against Stand-alone mode, but maybe its bigger than we knew) #20992

the 'type ahead' to get more / better info from the logs thru ingest is not working currently, logged separately. I can dig in to get more logs from the agent hosts later, if help is needed - but no need for it to sit idle waiting on me, so I'm dropping it into the system.

seem to impact both Agents that have Endpoint and those that don't. But all Endpoint integrations seem up and alive in the Security app, so Agent must be basically ok!?

screenshots:
timestamps are repeated...

impacting both endpoint + non-endpoint enabled hosts:

elasticmachine · 2020-09-08T19:00:11Z

Pinging @elastic/ingest-management (Team:Ingest Management)

ph · 2020-09-08T19:15:42Z

@nchaulet Could it be linked to the performance changes we did?

nchaulet · 2020-09-08T19:52:55Z

It could be linked yes looks like the same events are send again and again, maybe the change I made to have a timeout of 5 minutes is not working here.

I am doing some test to check what is happening here

nchaulet · 2020-09-08T20:06:04Z

Just did a test against https://kibana.endpoint.elastic.dev/ and my agent is well reported as online

@EricDavisX Do you know how those agents are runned? somewhere on server running as a service? and do we have logs of these agents?

EricDavisX · 2020-09-09T00:20:51Z

I do know! our wiki page at /display/DEV/Endpoint+and+Ingest+Nightly+Dev+Demo+Server has details.
for now you can look at the siem team repo:
https://github.com/elastic /siem-team/blob/master/cm/ansible/roles/deploy-agent/tasks/linux-main.yml

it will show something like this in ansible:

- name: Create install directory
  file:
    path: "{{ install_dir_linux }}"
    mode: "0755"
    state: directory

- name: Set download url
  set_fact:
    agent_url: "{{ snapshots.json | json_query('packages.\"' + agent_handle_linux + '.tar.gz\".url') }}"

- name: Download and Extract Agent zip
  unarchive:
    remote_src: yes
    src: "{{ agent_url }}"
    dest: "{{ install_dir_linux }}"

- name: Enroll the agent
  become: yes
  shell:  "{{ install_dir_linux }}/{{ agent_handle_linux }}/elastic-agent enroll -f https://{{ kibana_username }}:{{ kibana_password }}@kibana.{{ domain_name }}:443 {{ enroll_token }}"

- name: Create the service file
  template:
    dest: /etc/systemd/system/fleet-agent.service
    src: fleet-agent.service.j2
    mode: '0644'
  register: service_file

- name: reload systemd configs to pickup changes
  systemd:
    daemon_reload: yes
  when: service_file.changed

- name: restart fleet-agent service
  systemd:
    name: fleet-agent.service
    state: restarted
    enabled: yes

It has worked prior, and seems still working, to start agent (just not sure how long they'll stay up?)
there are other ansible files that get the token and do more supporting thing.

EricDavisX · 2020-09-09T00:23:04Z

Just did a test against kibana.endpoint.elastic.dev and my agent is well reported as online

@EricDavisX Do you know how those agents are runned? somewhere on server running as a service?

also @nchaulet if you used a 7.8 Agent its totally cheating. :) Can we confirm again and keep researching with a full 8.0 env? Its helpful to know tho that the older Agent works, it means the problem is indeed maybe on the Agent side.

nchaulet · 2020-09-09T01:45:07Z

Got a repro locally with a timeout, my bad I did not check with @michalpristas or @blakerouse what is the timeout for the request checkin, @michalpristas how complicated it is to modify the timeout for the checkin request? (it's set to 5 minutes on kibana side).

20-09-08T21:41:12.003-0400	ERROR	application/fleet_gateway.go:176	Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post "http://localhost:5601/api/ingest_manager/fleet/agents/7b3785e6-7b3d-4d24-836d-bcf3de6ff8aa/checkin?": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

While we have a proper fix this can be fixed by adding this to kibana.yml config

xpack.ingestManager.fleet.pollingRequestTimeout: 60000

michalpristas · 2020-09-09T11:53:39Z

@nchaulet not complicated at all will prepare a PR.
can you link me a change which caused this? (so i can link PRs)

ph · 2020-09-09T12:07:47Z

@nchaulet if I understand correctly this issue should be assigned to @michalpristas ?

EricDavisX added regression Team:Ingest Management Ingest Management:beta2 Group issues for ingest management beta2 labels Sep 8, 2020

ph assigned nchaulet Sep 8, 2020

michalpristas mentioned this issue Sep 9, 2020

[Ingest Manager] Increase kibana client timeout to 5 minutes #21037

Merged

6 tasks

michalpristas closed this as completed in #21037 Sep 10, 2020

michalpristas mentioned this issue Sep 16, 2020

Cherry-pick #21037 to 7.x: Increase kibana client timeout to 5 minutes #21115

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Agents are intermittently showing as off-line in Kibana Fleet #21025

[Fleet] Agents are intermittently showing as off-line in Kibana Fleet #21025

EricDavisX commented Sep 8, 2020

elasticmachine commented Sep 8, 2020

ph commented Sep 8, 2020

nchaulet commented Sep 8, 2020 •

edited

Loading

nchaulet commented Sep 8, 2020 •

edited

Loading

EricDavisX commented Sep 9, 2020 •

edited by jfsiii

Loading

EricDavisX commented Sep 9, 2020

nchaulet commented Sep 9, 2020 •

edited

Loading

michalpristas commented Sep 9, 2020 •

edited

Loading

ph commented Sep 9, 2020

[Fleet] Agents are intermittently showing as off-line in Kibana Fleet #21025

[Fleet] Agents are intermittently showing as off-line in Kibana Fleet #21025

Comments

EricDavisX commented Sep 8, 2020

elasticmachine commented Sep 8, 2020

ph commented Sep 8, 2020

nchaulet commented Sep 8, 2020 • edited Loading

nchaulet commented Sep 8, 2020 • edited Loading

EricDavisX commented Sep 9, 2020 • edited by jfsiii Loading

EricDavisX commented Sep 9, 2020

nchaulet commented Sep 9, 2020 • edited Loading

michalpristas commented Sep 9, 2020 • edited Loading

ph commented Sep 9, 2020

nchaulet commented Sep 8, 2020 •

edited

Loading

nchaulet commented Sep 8, 2020 •

edited

Loading

EricDavisX commented Sep 9, 2020 •

edited by jfsiii

Loading

nchaulet commented Sep 9, 2020 •

edited

Loading

michalpristas commented Sep 9, 2020 •

edited

Loading