Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metricbeat][Kubernetes] Add metricset for state_namespace #36406

Merged
merged 21 commits into from
Oct 4, 2023
Merged

[Metricbeat][Kubernetes] Add metricset for state_namespace #36406

merged 21 commits into from
Oct 4, 2023

Conversation

constanca-m
Copy link
Contributor

What does this PR do?

Add a new metricset, state_namespace, to the Kubernetes metricbeat module.

You can all find the metrics available for it in here. We only fetch kube_namespace_created and kube_namespace_status_phase in our metricset (both are stable metrics).

Considerations

Unlike the other state metricsets, this one needs to have the state_ prefix in the resource object. Example:

  • kubernetes.state_deployment adds a kubernetes.deployment object to the document;
  • However, kubernetes.state_namespace adds a kubernetes.state_namespace object. We do this because kubernetes.namespace already exists as a field. This way we cannot add a new object to the document with an already registered field of a different type. Otherwise, we will keep running into this error:
    {\"type\":\"document_parsing_exception\",\"reason\":\"[1:2039] object mapping for [kubernetes.namespace] tried to parse field [namespace] as object, but found a concrete value\"}, dropping event!","service.name":"metricbeat","ecs.version":"1.6.0"}
    

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

You can follow the instructions in this file.

Related issues

Relates to elastic/elastic-agent#3100.

Results

The source will be in ES as:

{
    "@timestamp": "2023-08-24T09:05:25.374Z",
    "ecs": {
      "version": "8.0.0"
    },
    "host": {
      "name": "kind-control-plane"
    },
    "agent": {
      "version": "8.11.0",
      "ephemeral_id": "255070f6-66b9-4a35-bb3e-6f9374a0b693",
      "id": "9da9b5b9-7f5a-42a0-85a8-e98ad350e8ae",
      "name": "kind-control-plane",
      "type": "metricbeat"
    },
    "event": {
      "module": "kubernetes",
      "duration": 2314627,
      "dataset": "kubernetes.state_namespace"
    },
    "metricset": {
      "period": 10000,
      "name": "state_namespace"
    },
    "kubernetes": {
      "state_namespace": {
        "name": "default",
        "created": {
          "sec": 1692858980
        },
        "status": {
          "active": true,
          "terminating": false
        }
      },
      "labels": {
        "kubernetes_io/metadata_name": "default"
      }
    },
    "service": {
      "address": "http://kube-state-metrics:8080/metrics",
      "type": "kubernetes"
    },
    "orchestrator": {
      "cluster": {
        "name": "kind",
        "url": "kind-control-plane:6443"
      }
    }
}

In Discover:
image

@constanca-m constanca-m added Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team backport-v8.10.0 Automated backport with mergify labels Aug 24, 2023
@constanca-m constanca-m self-assigned this Aug 24, 2023
@constanca-m constanca-m requested review from a team as code owners August 24, 2023 09:29
@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Aug 24, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Aug 24, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Duration: 71 min 10 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@@ -238,7 +241,7 @@ func NewResourceMetadataEnricher(
case *kubernetes.StatefulSet:
m[id] = metaGen.Generate(StatefulSetResource, r)
case *kubernetes.Namespace:
m[id] = metaGen.Generate("namespace", r)
m[id] = metaGen.Generate(NamespaceResource, r)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is if we break sth here that we dont see.

My local tests are successful though and for eg. a pod has the namespace in it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will break anything. We were following the same logic for all the case and we haven't had a problem (at least yet).

But I notice that adding add_metadata makes us loose the UUID. I checked for 2 metricsets: state_deployment and this new one.

We get an object for every metricset that is like this:

"kubernetes":{
      "deployment":{
         "name":"same-name",
         "uid":"9db79c06-db65-4081-923f-b78deb19eda7"
      },

And then we overwrite the object again kubernetes.deployment and we loose this data. I am using here state_deployment as reference since that one is released, unlike state_namespace.

)),
},

Labels: map[string]p.LabelMap{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only label that I see in my tests:
kubernetes.labels.kubernetes_io/metadata_name : kube-node-lease

So not sure if this line works here or needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get the name of the namespace, that is why I added the line. But I did notice that adding the add_metadata leads to loss of data (the UUID). I will write about it on your the comment above.

@gizas
Copy link
Contributor

gizas commented Aug 24, 2023

@constanca-m some tests also failing. Please fix those.

Apart from that the basic metrics are there:

Actions | Field | Value
-- | -- | --
  | kubernetes.state_namespace.created.sec | 1,692,871,720
  | kubernetes.state_namespace.name | kube-node-lease
  | kubernetes.state_namespace.status.active | true
  | kubernetes.state_namespace.status.terminating | false

@constanca-m
Copy link
Contributor Author

constanca-m commented Aug 24, 2023

@constanca-m some tests also failing. Please fix those.

I noticed, but running go test -data is constantly changing the expected files, so I am lost on what is going wrong... I will keep trying. @gizas
Edit: I had grouped them wrong by the name.

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition!
I have left some comments regarding some implementation details.

"created": {
"sec": 1691566337
},
"name": "kube-node-lease",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think reporting the name here would be redundant since it will be added anyways by the metadata Enricher. However maybe you can added under kubernetes.namespace directly and this would either be replaced by the Enricher or just remain there if the Enricher is disabled.

Copy link
Contributor Author

@constanca-m constanca-m Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is not true, because the kubernetes.namespace.name gets overwritten. I explain why in this comment: #36406 (comment)

With the approach you suggested below, it works. I merged the change.

@@ -0,0 +1,23 @@
- name: state_namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a name like namespace_state which reflects that this section gives information related to the namespace's state. state_* is quite an internal/elastic terminology and might not be so accurate/specific.

Note that I'm only referring to the fields' prefix and not the name of the metricset. The name of the metricset is just fine as is. But maybe we can improve the field's naming with what I mention.

@mlunadia do you have any preference here? state_namespace.* VS namespace_state.*?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gizas since you were the one suggesting this name, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also +1 for using namespace_state as the object field name.

@MichaelKatsoulis
Copy link
Contributor

@constanca-m this looks good to me. The only change that would be nice is to use a more meaningful field name, which will not be confused with the data stream name.

@constanca-m constanca-m requested a review from a team as a code owner August 29, 2023 13:44
@@ -88,6 +88,7 @@ The list below covers the major changes between 7.0.0-rc2 and main only.

==== Added

- Add new metricset in Kubernetes module, `namespace_state`. {pull}36406[36406]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this comment #36406 (comment) was related only to the field name, where metrics are store kubernetes.state_namespace.* - > kubernetes.namespace_state.* , not to the name of the metricset itself.

I think that the state_* in the metricset name is a reference to the metrics source - that they are scraped from the kube_state_metrics, if it is correct module name should be state_namespace for consistency. @MichaelKatsoulis @ChrsMark is my assumption regarding naming correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tetianakravchenko, you are right, I misunderstood. So now the document looks like this:

{
    "@timestamp": "2019-03-01T08:05:34.853Z",
    "event": {
        "dataset": "kubernetes.namespace_state",
        ...
    },
    "kubernetes": {
        "namespace": "kube-node-lease",
        "namespace_state": {
            ...
        }
    },
    "metricset": {
        "name": "state_namespace",
        ...
    },
    ...
}

I could not change the event.dataset, I am not sure what this field means and elastic docs are not very clear either. I am guessing it is the "family" of fields that caused the metrics on the document.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think event.dataset should be kubernetes.namespace. I explain this at #36406 (comment). I'm not sure if that line there also has impact in other places but event.namespace should be kubernetes.namespace if we want to be more accurate.

- state_persistentvolume
- state_persistentvolumeclaim
- state_storageclass
- state_namespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems you pushed local changes for this file, could you please revert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Is it ok now with just state_namespace added?

if resourceName != util.NamespaceResource {
resourceName = strings.ReplaceAll(resourceName, prefix, "")
} else {
resourceName = "namespace_state"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the resource name should be namespace so as to be used in event.dataset properly. Potentially we could extend the metricset to also add other fields like namespace_foo so just using namespace_state for the dataset I think is not so correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that will break again because kubernetes.state already exists 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that the value of event.dataset should be set to namespace. I think this would work since it's the value that we set so I don't see how it would be in conflict with anything else, right?

Copy link
Contributor Author

@constanca-m constanca-m Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would create a conflict, because that field is setting the object kubernetes.namespace_state. And if we change it to kubernetes.namespace, this is the document that is generated (I just ran this):

 {
        "event": {
            "dataset": "kubernetes.namespace",
            "duration": 115000,
            "module": "kubernetes"
        },
        "kubernetes": {
            "namespace": {
                "created": {
                    "sec": 1691566342
                },
                "status": {
                    "active": true,
                    "terminating": false
                }
            }
        },
        "metricset": {
            "name": "state_namespace",
            "period": 10000
        },
        "service": {
            "address": "127.0.0.1:55555",
            "type": "kubernetes"
        }
    },
    

Which will cause the conflict with the string field kubernetes.namespace @ChrsMark

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to change the kubernetes.namespace_state part. This can be as is.
I only suggest to change the value of event.dataset. I guess it's doable to just set this specifically within the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could try that, but how do I set that field? I had a look at the code and:

e, err := util.CreateEvent(event, "kubernetes."+resourceName)

if err != nil {
	m.Logger().Error(err)
}

between these lines, there is still not event.dataset field.

The event is {"_module":{"namespace":"kube-node-lease"},"created":{"sec":1693554126},"status":{"active":true,"terminating":false}} (example) and e is {{} {"namespace":"kube-node-lease"} {"created":{"sec":1693554126},"status":{"active":true,"terminating":false}} kubernetes.namespace_state 0001-01-01 00:00:00 +0000 UTC <nil> 0s 0s false} (example).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that we have change our minds here so many times, just changing event.dataset = kubernetes.namespace_state, breaks also the user experience that metrics that come from kube state metrics have the pattern:kubernetes.state_*

Just retested those and wanted to bring this to discussion once more. @elastic/obs-cloudnative-monitoring team any thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's best to have this discussion over a meeting? Otherwise it seems it will take some time until we all align. @gizas @ChrsMark

@ChrsMark
Copy link
Member

Hey everyone, I don't want to block this PR but want to raise the concern of being blocked from using the kubernetes.namespace.* as an object to avoid doing a breaking a change.

I think that sooner or later we will hit this as part of the ECS-Otel merger as well -> https://opentelemetry.io/docs/specs/otel/resource/semantic_conventions/k8s/#namespace here it is an obejct already.

So I wonder what is the long term plan with this and how we should approach this. @mlunadia @tommyers-elastic @gizas what do you think? What are our chances of introducing this breaking change and when it would be the right time to do it?

@gizas
Copy link
Contributor

gizas commented Sep 8, 2023

@ChrsMark the tranformation of kubernetes.namespace to object better to be handled in a different story in my opinion.

So for now I would say to merge this PR and to open the new story for changing the namespace to object for all kubernetes datasets. This will need documentation updates for our users. Also to add the note that when this new story will start, try to also align this new kubernetes.state_namespace and the work done here. So the new story to have a reference for this pr here

@tommyers-elastic what do you think?

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully agree with adding sth that is not 100% future proof and just planning to revisit this in the future. If we add this now but we know it will change we will have users to deal with breaking changes even for this newly added metricset. I would see more value in dealing directly with the breaking changes now rather than adding just a metricset.

However approving since it's decided to proceed with the current naming formats.
Please file an issue about revisiting those and make sure to prioritize this.

@constanca-m
Copy link
Contributor Author

Thanks @ChrsMark . Byt this,

I don't fully agree with adding sth that is not 100% future proof and just planning to revisit this in the future.

Do you mean we should know take care of freeing kubernetes.namespace string so we can use it for this metricset? As it is now, the changes here introduced don't cause any breaking changes.

@ChrsMark
Copy link
Member

ChrsMark commented Oct 3, 2023

Thanks @ChrsMark . Byt this,

I don't fully agree with adding sth that is not 100% future proof and just planning to revisit this in the future.

Do you mean we should know take care of freeing kubernetes.namespace string so we can use it for this metricset? As it is now, the changes here introduced don't cause any breaking changes.

I mean that with the current implementation of this metricset we will face an extra breaking change if we decide to change the fields' schema. So the kubernetes.namespace_state would become kubernetes.namespace which will be a breaking change for the newly added metrciset. But as I said since it's decided and we are willing to take this way, let's go for it and iterate on this :)!

@constanca-m constanca-m merged commit 165d970 into elastic:main Oct 4, 2023
25 of 27 checks passed
@constanca-m constanca-m deleted the kubernetes-state_namespace branch October 4, 2023 06:37
mergify bot pushed a commit that referenced this pull request Oct 4, 2023
* Add metricset.

---------

Co-authored-by: Chris Mark <[email protected]>
(cherry picked from commit 165d970)
@constanca-m
Copy link
Contributor Author

The issue to change kubernetes.state_namespace to kubernetes.namespace can be found in #36737.

Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.10.0 Automated backport with mergify Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants