Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metricbeat] Add state_job metricset #26479

Merged
merged 7 commits into from
Jul 1, 2021
Merged

Conversation

jarpy
Copy link
Contributor

@jarpy jarpy commented Jun 24, 2021

Add a Metricbeat metricset for Kubernetes Job resources.

What does this PR do?

Adds a state_job metricset to the Metricbeat Kubernetes module. Follows the established pattern of consuming Kube State Metrics output.

Why is it important?

Enables monitoring and alerting for Kubernetes Jobs.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Run unit tests
$ MODULE=kubernetes mage -v test
  1. Deploy Metricbeat and enable this metricset. Verify that metrics about Jobs are being collected as expected along with k8s metadata.

Add a Metricbeat metricset for Kubernetes Job resources.
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 24, 2021
// Add owner information provided by the "kube_job_owner" InfoMetric.
"owner_kind": p.Label("owner.kind"),
"owner_name": p.Label("owner.name"),
"owner_is_controller": p.Label("owner.is_controller"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could, in theory, be a proper boolean, but I wasn't sure how to cast it down here in the Labels section, where BooleanMetric can't be used (as far as I can tell).

Any tips? OK to leave it as a "boolean string", maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't just have true/false but it can also be <none> I think we are OK with a "boolean string".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. Would you normally store the exact "<none>" string from KSM, or should I translate it to "none" or similar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could "cook" it a little bit and convert it to "none" but I think keeping as is work too. I'm more into leaving it as what we get from the original source.

So need to change :).

@jarpy jarpy requested a review from ChrsMark June 24, 2021 14:33
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 24, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Jun 24, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Started by user Toby McLaughlin

  • Start Time: 2021-06-30T23:20:40.199+0000

  • Duration: 144 min 32 sec

  • Commit: 81d1a33

Test stats 🧪

Test Results
Failed 0
Passed 47307
Skipped 5257
Total 52564

Trends 🧪

Image of Build Times

Image of Tests

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 47307
Skipped 5257
Total 52564

@jarpy
Copy link
Contributor Author

jarpy commented Jun 25, 2021

Fixed a couple of errors detected by CI.

I haven't been able to get the integtests running locally, sorry about that,

@ChrsMark ChrsMark added the Team:Integrations Label for the Integrations team label Jun 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @jarpy, looks great already! I left some comments, let me know what you think!

deploy/kubernetes/elastic-agent-standalone-kubernetes.yaml Outdated Show resolved Hide resolved
- name: is_controller
type: keyword
description: >
Owner is controller ("true" or "false")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be also <none>, or this value is just omitted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're quite right. <none> is a possibility. I hadn't seen it because I had been testing with CronJobs.

Fixed in 618705f.

// Add owner information provided by the "kube_job_owner" InfoMetric.
"owner_kind": p.Label("owner.kind"),
"owner_name": p.Label("owner.name"),
"owner_is_controller": p.Label("owner.is_controller"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't just have true/false but it can also be <none> I think we are OK with a "boolean string".

metricbeat/docs/modules/kubernetes.asciidoc Show resolved Hide resolved
@jarpy
Copy link
Contributor Author

jarpy commented Jun 30, 2021

Thanks for checking this out @ChrsMark, and for the keen observations.

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thanks for adding this!

Let's wfg now.

Copy link
Contributor

@MichaelKatsoulis MichaelKatsoulis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. Thanks @jarpy !

@jarpy
Copy link
Contributor Author

jarpy commented Jun 30, 2021

Thank you both! We have big things planned for our fleet monitoring after this is in. :D

Let's wfg now.

I don't know what that is, but I'm going to assume it's a good thing. ;)

As a guest here, I'm not sure if I should hit "merge", or you would prefer to. I also don't know your backport workflow. So I must ask for a just a little more of your time and help, cheers.

Jenkins, test this please, becuase it looks like the last build stalled on Auditbeat at:

Allocating a worker with the labels 'immutable && ubuntu-18'

@jarpy
Copy link
Contributor Author

jarpy commented Jun 30, 2021

Jenkins, retest this please (?)

@jarpy
Copy link
Contributor Author

jarpy commented Jun 30, 2021

Triggered rerun from the Jenkins UI.

@ChrsMark ChrsMark added the backport-v7.15.0 Automated backport with mergify label Jul 1, 2021
@ChrsMark
Copy link
Member

ChrsMark commented Jul 1, 2021

Hey @jarpy sorry for stalling cause of the CI, wfg (== waitForGreen) took longer...

This will hit 7.15 and hence the backport-v7.15.0 label should be enough to open the backport PR to the proper branch (7.x here).

I will merge it now and take take care of merging the backport PR too.

Thanks again and feel free to reach out to us while you are working on your monitoring solution cause dogfooding is always valuable!

@ChrsMark ChrsMark merged commit 24c07b2 into elastic:master Jul 1, 2021
@jarpy jarpy deleted the state_job branch July 1, 2021 14:12
ChrsMark pushed a commit to ChrsMark/beats that referenced this pull request Jul 1, 2021
mergify bot pushed a commit that referenced this pull request Jul 1, 2021
(cherry picked from commit 24c07b2)

# Conflicts:
#	metricbeat/docs/fields.asciidoc
#	metricbeat/module/kubernetes/fields.go
v1v added a commit to v1v/beats that referenced this pull request Jul 5, 2021
…stage-failed-within-same-build

* upstream/master: (36 commits)
  Revert "[CI] fight the flakiness with some retry option in the CI only for the Pull Requests (elastic#26617)" (elastic#26704)
  Packaging: linux/armv7 is not supported (elastic#26706)
  Cyberarkpas: Link to official docs on how to setup TLS (elastic#26614)
  Make network_direction, registered_domain and convert processors compatible with ES older than 7.13.0 (elastic#26676)
  Disable armv7 packaging (elastic#26679)
  [Heartbeat] use --params flag for synthetics (elastic#26674)
  Update dependent package to avoid downloading a suspicious file (elastic#26406)
  [mergify] set title and allow bp in any direction (elastic#26648)
  Fix memory leak in SQL helper when database is not available (elastic#26607)
  [CI] fight the flakiness with some retry option in the CI only for the Pull Requests (elastic#26617)
  [mergify] automate PRs that change the backport rules (elastic#26641)
  [Metricbeat] Add Airflow module in xpack (elastic#26220)
  chore: add-backport-next (elastic#26620)
  [metricbeat] Add state_job metricset (elastic#26479)
  CI: jenkins labels are less time consuming now (elastic#26613)
  [MetricBeat] [AWS] Fix aws metric tags with resourcegroupstaggingapi paginator (elastic#26385) (elastic#26443)
  Move openmetrics module to oss (elastic#26561)
  Skip flaky test TestFilestreamMetadataUpdatedOnRename (elastic#26609)
  [filebeat][fortinet] Use default add_locale for fortinet.firewall (elastic#26524)
  Enroll proxy settings (elastic#26514)
  ...
@andresrc andresrc added the test-plan-added This PR has been added to the test plan label Aug 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.15.0 Automated backport with mergify Team:Integrations Label for the Integrations team Team:Observability test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants