Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enrich events with cloud metadata when running in a cloud native environment (GKE, EKS, AKS, etc) #1704

Open
abroglesc opened this issue Aug 10, 2021 · 37 comments
Milestone

Comments

@abroglesc
Copy link

Motivation
If you have Falco deployed to many clusters across different AWS accounts or Google Cloud Projects it can be challenging to understand what Account/Project, Region, and Cluster this specific alert triggered on. This data is easily available via the instance metadata services in both EKS and GKE so it likely wouldn't be too difficult to dynamically enrich Falco events with this information.

Feature
A new Falco configuration flag that allows you to configure type of cluster (e.g. EKS, GKE, or AKS) and upon startup of the Falco daemon will make API calls to the instance metadata service for the following info:

  • Account ID / Project ID
  • Cluster Name
  • Region
  • AvailabilityZone
  • Node/Instance Name

Then allow these new pieces of metadata to be enriched on events and used in rules and outputs (https://falco.org/docs/rules/supported-fields/)

Alternatives
This could somewhat be done within falcosidekick but you lose out on the ability to enrich node/instanceId information since falcosidekick doesn't need to run on every node like the Falco daemonset does. The approach of handling this in falcosidekick would make it so that if there were events on the node level (%container.id='host') we don't actually know what exact node these events came from and thus what we should be potentially performing forensics on.

Additional context
GKE Endpoints:
You need to invoke requests with a request header:
Metadata-Flavor: Google

Metadata URL
project_id http://metadata.google.internal/computeMetadata/v1/project/project-id
zone http://metadata.google.internal/computeMetadata/v1/instance/zone
cluster_name http://metadata.google.internal/computeMetadata/v1/instance/attributes/cluster-name
instance_name http://metadata.google.internal/computeMetadata/v1/instance/name

EKS Endpoints:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html

AKS Endpoints:
I haven't used AKS or Azure but it appears their documentation for the metadata service is here:
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/instance-metadata-service?tabs=linux

@abroglesc abroglesc changed the title Enrich events with Cloud Metadata when running in a cloud native environment (GKE, EKS, AKS, etc) Enrich events with cloud metadata when running in a cloud native environment (GKE, EKS, AKS, etc) Aug 10, 2021
@Kaizhe
Copy link
Contributor

Kaizhe commented Aug 10, 2021

Just a note: we may have to break it up into multiple tickets.

@poiana
Copy link
Contributor

poiana commented Nov 9, 2021

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@poiana
Copy link
Contributor

poiana commented Dec 9, 2021

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

@poiana
Copy link
Contributor

poiana commented Jan 8, 2022

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community.
/close

@poiana poiana closed this as completed Jan 8, 2022
@poiana
Copy link
Contributor

poiana commented Jan 8, 2022

@poiana: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jasondellaluce
Copy link
Contributor

/reopen

@poiana
Copy link
Contributor

poiana commented Jan 10, 2022

@jasondellaluce: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@poiana poiana reopened this Jan 10, 2022
@mmoyerfigma
Copy link
Contributor

I'd be thrilled if this also included ECS task metadata (task ID, task definition name/version, etc.) from the ECS Introspection API, which maps from a container ID to this data. These fields are analogous to the k8s.pod.id/k8s.deployment.name fields included when running with -pk.

@jasondellaluce
Copy link
Contributor

jasondellaluce commented Feb 1, 2022

Hey @abroglesc, @mmoyerfigma!

Yesterday we released Falco 0.31.0 that beings support to the new plugin system. What you both described here is a perfect fit for an extractor plugin and could even be written in Go with few lines of code.
Example 👉🏼 https://github.com/falcosecurity/plugin-sdk-go/blob/main/examples/extractor/extractor.go

What do you think about working together to implement this? I can help you getting started with the plugin development!

@mmoyerfigma
Copy link
Contributor

What do you think about working together to implement this? I can help you getting started with the plugin development!

I may take a look at this. I ended up writing a post-processor that runs via program_output and interacts with that ECS API. Should be easy to refactor that code into an extractor plugin, I think.

@mmoyerfigma
Copy link
Contributor

I started looking into this plugin interface, but I'm worried it's not suitable for my use case unless I'm misunderstanding something. I can write an extractor plugin that makes fields like ecs.task_id or ecs.task_definition available in rules, but since none of the default rules use those keys, my ECS metadata won't show up in most alerts.

I could fork all the rules and add %ecs.task_id to each of them, but what I really want is an extension point that replaces the special %container.info handling, so it gets appended by default to every rule.

@jasondellaluce
Copy link
Contributor

jasondellaluce commented Feb 2, 2022

If I understand correctly, I think what you're looking for is the -p Falco option:

 -p <output_format>, --print <output_format>
                               Add additional information to each falco notification's output.
                               With -pc or -pcontainer will use a container-friendly format.
                               With -pk or -pkubernetes will use a kubernetes-friendly format.
                               With -pm or -pmesos will use a mesos-friendly format.
                               Additionally, specifying -pc/-pk/-pm will change the interpretation
                               of %container.info in rule output fields.

It does not just limit to -pk. With that, you would be able to append arbitrary formats to every rule output, and include fields like ecs.task_id. Even if you can't customize the container.info replacement, that's still handy. More here 👉🏼 https://falco.org/docs/alerts/formatting/

Besides, I think working on a plugin like this would be a valuable addition to the project and the ecosystem.

@mmoyerfigma
Copy link
Contributor

Yeah, -pc is what I'm using today, and the functionality I'd want to build in a plugin is like a new -pe (ECS), but I don't see how that's possible to do in the current plugin API, since it doesn't fit the pattern of a source plugin or an extractor plugin.

@jasondellaluce
Copy link
Contributor

jasondellaluce commented Feb 2, 2022

We can investigate better customization capabilities in the future, but for now instead of having a falco -pe you would have a falco -p"taskid=%ecs.task_id, taskdef=%ecs.task_definition, ...", which will append that formatted output to every rule alert and use your plugin to extract the info of each field.

So basically having an extractor plugin implementing the new fields, and then running falco -p"..." should cover this use case.

@poiana
Copy link
Contributor

poiana commented Mar 4, 2022

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community.
/close

@poiana
Copy link
Contributor

poiana commented Mar 4, 2022

@poiana: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue with /reopen.

Mark the issue as fresh with /remove-lifecycle rotten.

Provide feedback via https://github.com/falcosecurity/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@poiana poiana closed this as completed Mar 4, 2022
@jasondellaluce
Copy link
Contributor

/reopen
/remove-lifecycle rotten

@poiana
Copy link
Contributor

poiana commented Mar 5, 2022

@jasondellaluce: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@poiana poiana reopened this Mar 5, 2022
@poiana
Copy link
Contributor

poiana commented Jun 3, 2022

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@jasondellaluce
Copy link
Contributor

/remove-lifecycle rotten
/reopen

@poiana
Copy link
Contributor

poiana commented Mar 14, 2023

@jasondellaluce: Reopened this issue.

In response to this:

/remove-lifecycle rotten
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@poiana
Copy link
Contributor

poiana commented Jun 12, 2023

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@jasondellaluce
Copy link
Contributor

/remove-lifecycle stale

This should now be possible due to the newest features of the plugin framework.

@Andreagit97 Andreagit97 added this to the TBD milestone Aug 31, 2023
@poiana
Copy link
Contributor

poiana commented Nov 29, 2023

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@poiana
Copy link
Contributor

poiana commented Dec 29, 2023

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

@Andreagit97
Copy link
Member

/remove-lifecycle rotten

@poiana
Copy link
Contributor

poiana commented Apr 2, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@Andreagit97
Copy link
Member

/remove-lifecycle stale

@poiana
Copy link
Contributor

poiana commented Jul 1, 2024

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

@poiana
Copy link
Contributor

poiana commented Jul 31, 2024

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

@Andreagit97
Copy link
Member

/remove-lifecycle rotten

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants