Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop counterCache only when already started #19103

Merged
merged 11 commits into from
Jun 17, 2020

Conversation

ChrsMark
Copy link
Member

@ChrsMark ChrsMark commented Jun 10, 2020

What does this PR do?

Adds a better handling for counterCache stop in order to avoid trying to stop a Janitor that is not yet started which will lead to panic errors. This will happen in cases where the Metricset is stopped before the first Fetch is called and hence the generator is not yet started:

m.once.Do(m.promEventsGen.Start)

In order to handle this properly we wait 2 periods (the fetching period of the metricset) until we try to stop the counterCache. If after 2 periods the generator is not marked as started then we just return in Stop without stoping the counterCache since this indicates that the metricset was stoped before the first Fetch.

Why is it important?

To avoid having panic crashes on Autodiscover scenarios.

Related issues

@ChrsMark ChrsMark added bug review needs_backport PR is waiting to be backported to other branches. [zube]: In Review autodiscovery v7.7.0 Team:Platforms Label for the Integrations - Platforms team v7.8 v7.9.0 labels Jun 10, 2020
@ChrsMark ChrsMark self-assigned this Jun 10, 2020
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 10, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-platforms (Team:Platforms)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 10, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 10, 2020

💚 Build Succeeded

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Pull request #19103 updated]

  • Start Time: 2020-06-17T07:16:03.511+0000

  • Duration: 79 min 58 sec

Test stats 🧪

Test Results
Failed 0
Passed 4019
Skipped 935
Total 4954

Signed-off-by: chrismark <[email protected]>
Signed-off-by: chrismark <[email protected]>
Signed-off-by: chrismark <[email protected]>
Signed-off-by: chrismark <[email protected]>
@ChrsMark
Copy link
Member Author

@exekias I think this is ready for a review ;), let me know what you think.

Comment on lines 197 to 206
attempts := 0
for !m.eventGenStarted {
time.Sleep(m.period)
// waiting 2 periods so as the first Fetch to start the generator, otherwise return since
// Fetch is not called at all
if attempts > 2 {
return nil
}
attempts++
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a backoff here. MB will make sure Fetch is not called after Close. This could just be:

if !m.eventGenStarted {
m.promEventsGen.Stop()
}

Signed-off-by: chrismark <[email protected]>
@ChrsMark ChrsMark requested a review from jsoriano June 17, 2020 07:14
@ChrsMark ChrsMark requested a review from exekias June 17, 2020 07:14
Signed-off-by: chrismark <[email protected]>
@ChrsMark ChrsMark merged commit 4a45bb4 into elastic:master Jun 17, 2020
ChrsMark added a commit to ChrsMark/beats that referenced this pull request Jun 17, 2020
@ChrsMark ChrsMark removed the needs_backport PR is waiting to be backported to other branches. label Jun 17, 2020
ChrsMark added a commit that referenced this pull request Jun 24, 2020
melchiormoulin pushed a commit to melchiormoulin/beats that referenced this pull request Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metricbeat k8s pod crash on 502 from Prometheus + Linkerd
5 participants