Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Agents goes to unhealthy state on installation with system Integration. #1361

Closed
harshitgupta-qasource opened this issue Sep 29, 2022 · 11 comments · Fixed by #1371
Closed
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@harshitgupta-qasource
Copy link

Kibana version: 8.4.3 BC2 Kibana cloud environment

Host OS: WINDOWS

Build details:
VERSION: 8.4.3 BC2
BUILD: 55572
COMMIT: 1ceb607762eaafa726c61d6eee5b95359142d4c4
Artifact link: https://staging.elastic.co/8.4.3-c9575cad/downloads/beats/elastic-agent/elastic-agent-8.4.3-windows-x86_64.zip

Preconditions:

  1. 8.4.3 BC2 Kibana cloud environment should be available.
  2. Windows agent v8.4.3 should be installed.

Steps to reproduce:

  1. Navigate to Fleet> Create new Agent Policy .
  2. Enroll windows agent with policy having system Integration.
  3. Observe after 4-5 minutes agent went unhealthy.
  4. Observe that no agent logs available under agent->logs tabs

Screenshot:
image

Logs:
elastic-agent-diagnostics-2022-09-29T05-04-57Z-00.zip

Expected Result:
Agent should come up with Healthy Status.

Additional Notes:

  • This issue is not reproducible for MAC and Linux .tar agent.
@harshitgupta-qasource harshitgupta-qasource added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team impact:high Short-term priority; add to current release, or definitely next. labels Sep 29, 2022
@harshitgupta-qasource
Copy link
Author

@manishgupta-qasource Please review

@manishgupta-qasource
Copy link

Secondary review for this ticket is Done

@amolnater-qasource
Copy link

FYI @jlind23

@jlind23
Copy link
Contributor

jlind23 commented Sep 29, 2022

@AndersonQ @fearful-symmetry @cmacknz can one of you take a look at this one? It seems to be a metricbeat_monitoring failure
{"log.level":"error","@timestamp":"2022-09-29T05:04:55.073Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":204},"message":"failed to dispatch actions, error: operator: failed to execute step sc-run, error: 2 errors occurred:\n\t* \"metricbeat_monitoring\" failed to prepare monitor for \"Metricbeat\": failed to create a directory \"\": mkdir : The system cannot find the path specified.\n\t* \"metricbeat_monitoring\" failed to prepare monitor for \"Metricbeat\": failed to create a directory \"\": mkdir : The system cannot find the path specified.\n\n: 2 errors occurred:\n\t* \"metricbeat_monitoring\" failed to prepare monitor for \"Metricbeat\": failed to create a directory \"\": mkdir : The system cannot find the path specified.\n\t* \"metricbeat_monitoring\" failed to prepare monitor for \"Metricbeat\": failed to create a directory \"\": mkdir : The system cannot find the path specified.\n\n","ecs.version":"1.6.0"}

@AndersonQ
Copy link
Member

I'm looking at it

@AndersonQ
Copy link
Member

hey, could you look at this issue #1361

Below are my findings, but I'm not sure about the expected behaviour, perhaps @michalpristas knows better.

The problem is here:

  • monitorDrop returns "", then
  • SidecarMonitor.Prepare calls os.MkdirAll("", 0775), which breaks.

I don't know what is the expected behaviour, what is the expected drop here :/

@michalpristas
Copy link
Contributor

looking at it behavior is incorrect, code should check for drop != ""
only handle IO when it returns some normal path.
"" is a valid scenario for cases when metrics are not exposed using pipes/sockets

@michalpristas
Copy link
Contributor

fix here: #1371
fix is a simple guard on drop path.
drop path can be empty if metrics endpoint is published over

  • network (possible if configured in a spec file)
  • NamedPipe (default on windows)

@michalpristas
Copy link
Contributor

Fix backported to 8.4

@ghost
Copy link

ghost commented Nov 22, 2022

Bug Conversion

Thanks!

@amolnater-qasource
Copy link

Hi Team,

Currently agents gets temporarily Unhealthy on installation and then gets Healthy after sometime on latest 8.7.0 BC3.
This issue is being tracked under:

Build details:
8.7.0 BC3
BUILD: 60804
COMMIT: abbdcf43f4e89f6fc085360252514aed7c032b4b

Hence, we are marking this issue as QA:Validated.
Thanks

@amolnater-qasource amolnater-qasource added QA:Validated Validated by the QA Team and removed QA:Ready For Testing Code is merged and ready for QA to validate labels Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants