Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scope of Falco artifacts #1114

Closed
krisnova opened this issue Mar 24, 2020 · 19 comments
Closed

Scope of Falco artifacts #1114

krisnova opened this issue Mar 24, 2020 · 19 comments

Comments

@krisnova
Copy link
Contributor

krisnova commented Mar 24, 2020

I would like to submit a proposal for the Falco project to adopt the following scope for build artifacts.

As it stands the current installation documentation calls out a vast number of ways to install Falco.

As the adoption of Falco grows, we should be very clear and deliberate about which artifacts the project officially supports and maintains, and which are third-party supported installation methods.

I would like to begin separating officially supported artifacts from third-party maintained artifacts so that the Falco maintainers can focus on offering concise stable artifacts that can then be used by third-party integrations and tooling.

I would propose the following.

Officially supported

All officially supported installations should perform the following:

  • Installation of Falco userspace
  • Installation of a Falco driver
  • Documentation

The official artifacts for these installations would be

  • Debian packages
  • RPM packages
  • Container images
  • Tools for building, downloading, and installing various drivers

Given these core artifacts, all of the existing integrations will remain possible. This will allow the Falco community to focus on supporting the more fundamental components necessary.


Unofficial third-party installation index

These are currently found in the documentation but would potentially be moved to a third-party index with pointers to the third-party installation guides.

  • Helm chart
  • Minikube
  • Kind
  • Puppet
  • Ansible
  • Docker (CoreOS)
  • Terraform
@krisnova
Copy link
Contributor Author

For supported operating systems for the driver, find the survey that was sent out:

https://docs.google.com/spreadsheets/d/1TKJIXAMWPRTb8x--Kkq17ipee8xvV08LFLfBN3VzLNU/edit#gid=385989146

@danmx
Copy link
Contributor

danmx commented Mar 25, 2020

I'd suggest taking a look at https://github.com/bottlerocket-os/bottlerocket as another potential OS (related issue: #1103)

@krisnova
Copy link
Contributor Author

krisnova commented Mar 26, 2020

After speaking with @leogr we wanted to propose some thoughts around running Falco with Kubernetes.

The above proposal solves the problem of Falco artifacts running on a Linux system, but does not give an official answer to

How do I run Falco with Kubernetes?

So we would like to add another officially supported toolchain to run Falco with Kubernetes. We would like to use falcoctl as the official installation method. Falcoctl would be responsible for installing Falco at the Linux level and configuring it in such a way that it works well with Kubernetes.

@danmx
Copy link
Contributor

danmx commented Mar 26, 2020

@kris-nova What is the reason for creating a special installation tool for Kubernetes?
When it comes to service packaging for k8s clear leaders in that space are Helm and Kustomize which are already supported by major CD tooling and companies' deployment procedures.

In my case, I'll have to export what your tool will deploy and make a helm chart out of it.

@mattfarina
Copy link

I have a few questions and some comments...

First, the comments...

  • You may already know this but if you don't, if you wanted to get some time with end users to talk to them about your ideas here the CNCF may provide an avenue for you. The CNCF end user community, which has numerous companies who are interested in security, has said they want to engage with and provide feedback (and maybe more) to projects. If you wanted to connect with some of these folks then I would suggest talking with Cheryl Hung of the CNCF. She can help make connections.
  • I have learned that some k8s cluster operators do not let users install CRDs into clusters for security reasons. They do not want workload operators (the people not the tooling) to have that level of access. I know CRDs and operators are a hot topic right now. I share this because I didn't realize it until about a year ago.

And then my questions...

  • Has anyone signed up or shown interest in the Helm Falco chart?
  • If someone comes to the Falco community and wants to get started (e.g., to kick the tires) what will the story be to tell them to get started? I've been taught by UX people that it's useful to help someone accomplish a task they care about in just 5 minutes. What would that look like in the new artifact setup?

@krisnova
Copy link
Contributor Author

krisnova commented Mar 26, 2020

Hey @danmx thanks for your comments

What is the reason for creating a special installation tool for Kubernetes?

Helm and Kustomize are fantastic ways to install Kubernetes applications.

Falco isn't a traditional Kubernetes application, and trying to fit it into a traditional kubernetes deployment tool violates container isolation boundaries, is a risk, and isn't flexible enough for what we need to do.

Installing Falco should be more like a kubeadm style experience than a helm install falco style experience if we want to take our systems security and stability seriously.


The paradigm that a user can simply helm install falco is broken in my mind. If you look at the helm chart itself, and explore what the pod is doing there are some concerns here.

The helm chart requires a privileged security context to load a driver

The helm install falco story of the past is certainly user friendly but is vastly insecure. Having a container mutate kernel resources violates the spirit of container isolation altogether. The fact that we require a privileged pod is a red flag in my eyes. Most production end-users won't even allow this configuration into their clusters in the first place.

# Taken from the helm chart
          securityContext:
            privileged: true

The reason for this is because we would load a driver at runtime. Which means the pod had to be privileged. Most end users won't even allow this configuration in their clusters.

Just because you can, doesn't mean you should

Not everything should be deployed with helm, as helm is bound to running above Kubernetes. There are reasons we don't have paradigms like the following:

helm install docker
helm install kubeadm
helm install selinux
helm install broadcom driver
helm install iptables
helm install nvidia driver
helm install kubelet

Again - in order to create the helm install falco experience we had to run with a privileged security context and have wide-open access to the kernel. As a security tool I wonder if we want to endorse this practice, let alone support it. Having a tool like falcoctl that behaves like kubeadm that can simplify this for a user without introducing a security concern makes sense to me.

Falco doesn't "run on" Kubernetes

It runs next to Kubernetes, and also below Kubernetes. In fact one could make the argument that it makes sense to have Falco running even before you install Kubernetes in the first place.

Falco is NOT a Kubernetes application. It's a kernel security tool that happens to have features that work well with Kubernetes.

It will be important that Falco users have a reliable, secure, and friendly avenue in which they install Falco. While most users think that something like helm install falco seems intuitive Falco isn't really something that runs "on top" of Kubernetes. It runs the layer below Kubernetes (several layers below to be honest) and has many components.

[ Kubernetes] --- [ Pods ] ------------- [ Falco clients / consumers / producers ] 
[ Kubernetes] --- [ Container Runtime ] ------------------------------------------
[ Kubernetes] --- [ Kubelet ] ----------------------------------------------------
----------------- [ Linux userspace ] -- [Falco program] -------------------------
----------------- [ Kernel ] ----------- [Falco driver] --------------------------

So while helm install falco can solve the top-most layer here and is an easy conclulsion to jump to. I feel that will frustrate and confuse users as they will expect that command to be a full-install of Falco and that simply can't happen in a safe way.

Setting the expectation that you can (and should) do that with helm seems wrong.

If you look at the kubelet, you certainly could run it as a daemonset. Which means you could potentially install it with helm. That doesn't make sense as the kubelet lives and runs at a layer of the system below Kubernetes. The kubelet exposes an API which other parts of Kubernetes can access using Kubernetes primitives.

Falco should follow this pattern in my mind.

Falco clients require mTLS

The Falco APIs are mTLS encrypted and we will need an easy way to rotate TLS over time as well as set up TLS upon installation. A lot of the data we will need will come from the cluster so being able to reason about this at runtime makes a lot more sense than forcing the user to aggregate it and add it as a --flag or in a YAML file.

So the resources we would be potentially deploying as Kubernetes objects are lightweight clients of the Falco systemd process, and a service to expose the Falco API to pods in a given namespace.

Now here is where a Kubernetes resource DOES make sense. Having an operator that can help with managing mTLS and rules as well as installing the Kubernetes resources seems much more powerful than having a static manifest approach. But again - an operator is only relevant after Falco is already set up at the Linux level.

Scope

Basically the problem here is that users think it's a good idea to install Falco with a tool like helm and I think that is a risk for us. It's just not that simple.

Let's look at the steps need to happen to install Falco on a fresh Linux system.

  1. Set up Falco userspace program
  2. Acquire and install a kernel driver
  3. Reconcile and watch Falco and the driver over time
  4. Set up mTLS and manage over time
  5. Set up Kubernetes audit and configure Falco as an audit endpoint
  6. Set up a Falco client running in Kubernetes to consume events from the Falco APIs

Now I do think users should have a clean, reliable, and friendly way to set this up. I just do not think a tool like helm or kustomize has a broad enough scope, is powerful (or safe) enough to do this for us. They fundamentally operate at a layer of the system that doesn't go deep enough to safely manage all of these concerns. If you look at what we need to do it aligns much more with a user experience like kubeadm than it does with something like wordpress

Hence why I suggested falcoctl as a way to install Falco, and begin running Falco next to Kubernetes.

It is very important that we understand we aren't re-inventing helm or kustomize, but that we rather need to do things that shouldn't (and in some cases can't) be done with Kubernetes resources. Hence why a command line tool to encapsulate this logic makes sense to me, and would provide a much more rich user experience.

We need to remind ourselves that a directory with Kubernetes YAML - at the end of the day - isn't enough to safely and reliably install Falco. Therefore as a project we shouldn't support it.


Example workflows

Manual (modeled after kubeadm)

  • Set up vanilla Linux environment
  • Acquire and install dependencies
    • kubeadm, kubelet, kubeconfig, falcoctl, eptables, cni-tools, etc
  • Install Kubernetes and watch with systemd using kubeadm
  • Use falcoctl to install falco and watch with systemd
  • Use falcoctl to install driver
  • Use falcoctl to set up mTLS
  • Use falcoctl to turn on auditing in Kubernetes
  • Use falcoctl to configure falco userspace rules and begin detecting

Note: Kubeadm has a concept of phases and performs many tasks in series - we should probably do the same with falcoctl I just called out the steps so that we can see them concretely. In fact kubeadm performs many of the same tasks in the several phases of bootstrapping Kubernetes. Another sign this model makes sense.

Pre-baked

  • Build an image that has the tools and drivers pre-baked into the system
  • Have a cloud-init script that will configure and start Kubernetes and Falco based on details gathered at runtime

This is how tools like kops work, and we could begin to start using libraries from falcoctl to build plugins and integrations with other installation tools for Kubernetes.

Notice that we already have done some of this with minikube and the solution was to pre-bake the driver into the minikube image.

We have also done this with kind and the "work-around" is to install the driver external to kind.


Conclusion

In every example we keep coming back to this fundamental truth:

Something needs to install a driver and configure various parts of Falco before we can run Falco in a container.

Which is why the project should support Linux artifacts and not Kubernetes artifacts.

If a user chooses to run Falco alongside Kubernetes, I feel very strongly it should be easy and safe for them. Hence, why it is my belief that a tool like falcoctl makes much more sense than wiki-scripting things, or putting a square peg in a round hole using a tool like helm.

Falco isn't a Kubernetes application. It is a Linux application. It should be installed and managed as one. Falco happens to offer features that play nicely with Kubernetes such as consuming audit logs and exposing services but setting Falco up should be a Linux style installation just like Kubeadm.

At the end of the day Falco could be installed before, or after Kubernetes is installed, or even as many of Falco end users have mentioned - in systems where Kubernetes isn't even running at all.


If there is a third-party community that would like to maintain a helm chart, or YAML manifests I think it's great! But I don't think those artifacts should ever have support from the project that goes any deeper than a "use-at-your-own-risk" "these are unsafe, unoffical and unsupported ways of installing Falco" experience. I even think the Falco community has a responsibility to index, and even host some of these unofficial installation methods.

These are probably a good starting place for folks interested in trying Falco out. We just don't want to create the expectation that installing Falco with helm is a good idea to do in production. Frankly, because it isn't. 😃

@krisnova
Copy link
Contributor Author

krisnova commented Mar 26, 2020

Falco isn't the only tool that falls to these set of technical concerns. Look at CNI for instance.

Cilium has a helm chart, and a DaemonSet configured but if you look at how it works

https://github.com/cilium/cilium/blob/master/install/kubernetes/cilium/charts/nodeinit/templates/daemonset.yaml#L25-L39

          imagePullPolicy: IfNotPresent
          securityContext:
            privileged: true
{{- if .Values.revertReconfigureKubelet }}
          lifecycle:
            preStop:
              exec:
                command:
                  - "nsenter"
                  - "-t"
                  - "1"
                  - "-m"
                  - "--"
                  - "/bin/sh"
                  - "-c"
                  - |
                    #!/bin/bash

It "works" by privilege escalating in your cluster.

Now many low-level container tools (like Cilium) can get away with this.

But Falco is a security tool. Which means we need to set a good precedent for our end-users and lead by example.

Installing any application like this with helm is a security concern, and Falco will alert if you try to do something like this. In fact if you install the Falco helm chart as it stands right now, Falco actually ends up alerting against itself 😄

A lot of helm charts aren't secure. This isn't something a security tool should endorse. If you look at the stable helm charts there are currently 53 cases of the string "privileged: true" appearing.

Falco by default would raise a security alert if any of those 53 charts were deployed.

As a security tool we can't endorse these types of packages for anyone else, especially ourselves.

The helm charts aren't secure, and it's our job to alert if somebody tries to run them. We shouldn't be contributing to the problem we are trying to address.

@danmx
Copy link
Contributor

danmx commented Mar 26, 2020

Thank you @kris-nova for the explanation.

I do agree with you on some cases but I think all depends how to look on Falco and its deployment.

I'm not a Linux kernel expert but from my perspective Falco is just a parser and forwarder of specific types of events, Kubernetes Audit logs and system calls.

Helm or Kustomize doesn't define security of what you're deploying. It's up to the creator of the configuration to use least privileged primitives and provide sufficient configuration (least privileged Pod Security Policy, seccomp filters, SELinux/Apparmor profiles).

Falco's current Helm chart can be improved in so many ways. Some of them relate to different type of deployment strategy.

A privileged container is not required and avoided by properly narrowing Linux capabilities and allowed system calls (via seccomp filters) that Falco require to run in syscall mode. Just by trial and error I managed to run Falco with an eBPF probe "just" with 2 capabilities (SYS_ADMIN, SYS_RESOURCE).

Single Falco daemon covering everything (k8s and syscalls) would work for small deployments. I'm thinking about dedicated instances of it depending on the functionality. This mean an exposed Deployment with no permissions running as non-root user for Kubernetes Audit logs and DaemonSet with all required capabilities and permissions for system calls. This isolates syscall processing Falco and k8s audit processing one is treated as any other service running in the cluster. Another benefit of this is managing updates of Falco runtime (not kernel module or eBPF probe) and its rules.

Tricky part is the kernel module or eBPF probe. As you mentioned both could be loaded at boot time via any bootstrapping tool or they can be part of the "pre-baked" system image as part of an immutable infrastructure. Like with kernel updates updating any of those components would require rolling out new system image but I presume those things wouldn't update as often as the runtime. If that is not possible always initContainer can be used with elevated privileges to provision the environment (example Istio's init container setting up iptables) and then regular container for runtime (think of it like privilege dropping).

Regarding mTLS in Kubernetes, service meshes tend to cover all the problems related to certificate rotation, enforcement, etc. I'm not suggesting dropping it since not everybody use service meshes.

@danmx
Copy link
Contributor

danmx commented Mar 26, 2020

I'm just trying to point out that standardised DevEx is quite important and it drives adoption.

@krisnova
Copy link
Contributor Author

krisnova commented Mar 26, 2020

Thank you @kris-nova for the explanation.

You are very welcome. Thank you for taking the time to read and understand my thoughts it here. It's very important to me.

Thank YOU @danmx for bringing these issues up.

I'm just trying to point out that standardised DevEx is quite important and it drives adoption.

I completely agree. 💯

Which is why I think having a one-stop-shop (falcoctl) that can do the following

  • Install all the falco components in a safe and secure way
  • Offer a single source of truth for our documentation
  • Provide a simple user story
  • Offer a single source of truth for the Falco maintainers
  • Offer a path forward without introducing any more dependencies
  • Manage Falco components (tls, rules, updates) over time in a secure way

seems to be the clear winner here.

Privilege escalation in any form is a deal breaker in my mind, Falco shouldn't alert against itself, and we should support the "Falco before Kubernetes" and "Falco without Kubernetes" models.

Introducing more dependencies like with a service mesh brings us back to square 1.

I want to give our end users a clear, concise, reliable, easy to use story for setting up runtime security in Linux. Then if they happen to be running Kubernetes, we will work well with them.

In my mind having a tool like falcoctl will give developers, operators, and end-users a first-class experience that caters to all of our constraints, known use cases, and does so in a safe and secure manner that matches the precedent we are trying to set as an open-source security tool.

@krisnova
Copy link
Contributor Author

krisnova commented Mar 26, 2020

I am not saying these will be the exact commands, but look at the user experience here.

ssh [email protected]
apt-get install falco falcoctl kubeadm kubectl kubelet docker.io
systemctl enable falco
systemctl start falco
kubeadm init 
falcoctl install falco # <--- Creates kubernetes resources for the first time
falcoctl audit configure # <--- Configures Kubernetes audit and restarts API server
falcoctl install tls # <--- mTLS baby
falcoctl install rule <rule> # <--- Authenticated rules against man in the middle
falcoctl token generate # <---- Generate a token for Falco pods to use to consume alerts
exit
helm install prometheus 
helm install prometheus-exporter --set token=<token from earlier>
helm install whatever

I would much rather be having conversations around the user experience with what tools are installed when and how in Linux than conversations around how to privilege escalate to perform these same tasks.

@danmx
Copy link
Contributor

danmx commented Mar 26, 2020

I think it'll be a hard sell for some people. I hope you prove me wrong 😄

how to privilege escalate to perform these same tasks.

So in the end you'll be running with full permissions in host environment. It is exactly the same as running a privileged container.

falcoctl audit configure # <--- Configures Kubernetes audit and restarts API server

I don't think many K8s operators will go for that. Especially if they keep their infrastructure as code.

@krisnova
Copy link
Contributor Author

krisnova commented Mar 26, 2020

If they have their infrastructure as code then they shouldn't need to do it 😉

@krisnova
Copy link
Contributor Author

So in the end you'll be running with full permissions in host environment. It is exactly the same as running a privileged container.

So I know at a glance it seems like I am being pedantic by separating the container layer with privilege escalation to the host layer, but it's actually a really large concern for some end-users.

The thesis here is that we need to be disciplined and respect isolation boundaries for production users.

By having Falco set up on the host system (only) we ensure that a production user disables the ability to privilege escalate they can still run Falco successfully.

Again - just because you can - doesn't mean you should. As a security tool we owe it to our users not to make any assumptions about their systems, and respect even the most rigid of security policy in a system.

Also as a reminder this is a thread about what Falco will support for production users. Not about what we might document or offer on the side. This is about what the Falco community is committing to maintaining for production users.

If someone wants to run Falco by privilege escalating they are more than welcome to - just the project won't advertise to triage or support any issues they have by doing so.

@leogr
Copy link
Member

leogr commented Mar 27, 2020

I totally agree with @kris-nova especially about "production users" vs. other use cases.

I believe that the helm chart is very easy and useful for several use cases (in particular for development), indeed I contributed to improving the chart and wrote a blog post endorsing its usage.

Nevertheless, I really think that is not enough for production users and mission-critical installations - reasons are explained in previous comments. IMHO, it perfectly makes sense that the Falco community should mainly focus on the best, strongest, most secure approach.

Furthermore, I believe that by doing so other "unofficial third-party installation" methods will benefit too. Having few official and well-curated installation approaches is a perfect guidance for folks willing to support other installation strategies.

@danmx
Copy link
Contributor

danmx commented Mar 27, 2020

@leogr do you plan to make a blog post describing in more detail this mission critical production use case (threat model, limitations, etc.)? I'd be very interested to see how other people run it in production.

@leogr
Copy link
Member

leogr commented Mar 27, 2020

@danmx not sure when, but probably we do a blog post once the official installation methods are well described and the toolchain has been improved accordingly.

@afbjorklund
Copy link
Contributor

For minikube, I think the BPF driver might eventually replace the Kernel driver falco-probe.ko.
Then there should be no reason to have to bundle anything, since it can run from the images... ?

And then it would probably also work better with the new "docker" driver, without any VM on Linux.
Otherwise we would have to load the kernel module in the host (laptop) kernel, not so popular...

@krisnova
Copy link
Contributor Author

krisnova commented May 4, 2020

Closing in favor of #1184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants