Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Egress NetworkPolicy are not enforced fast enough #197

Closed
tnqn opened this issue Dec 6, 2019 · 6 comments · Fixed by #222
Closed

Egress NetworkPolicy are not enforced fast enough #197

tnqn opened this issue Dec 6, 2019 · 6 comments · Fixed by #222
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@tnqn
Copy link
Member

tnqn commented Dec 6, 2019

Describe the bug
Create an egress NetworkPolicy that block all outgoing traffic, and create a Pod to which the policy applies. The Pod can reach outside in a short time slot before the NetworkPolicy is enforced.

To Reproduce
It caused the following K8s NetworkPolicy e2e tests fail:

[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [It] should allow egress access to server in CIDR block
[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [It] should enforce egress policy allowing traffic to a server in a different namespace based on PodSelector and NamespaceSelector [Feature:NetworkPolicy]
[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [It] should enforce policy to allow traffic only from a pod in a different namespace based on PodSelector and NamespaceSelector
[sig-network] NetworkPolicy [LinuxOnly] NetworkPolicy between server and client [It] should enforce multiple egress policies with egress allow-all policy taking precedence 

Expected
If the policy is created before the Pod, the Pod should be limited by the policy in its whole lifycycle.

Actual behavior
If the policy is created before the Pod, the Pod is not limited by the policy in a short slot.

Versions:
Please provide the following information:

  • Antrea version (Docker image tag).
    0.1.1
  • Kubernetes version (use kubectl version). If your Kubernetes components have different versions, please provide the version for all of them.
  • Container runtime: which runtime are you using (e.g. containerd, cri-o, docker) and which version are you using?
  • Linux kernel version on the Kubernetes Nodes (uname -r).
  • If you chose to compile the Open vSwitch kernel module manually instead of using the kernel module built into the Linux kernel, which version of the OVS kernel module are you using? Include the output of modinfo openvswitch for the Kubernetes Nodes.

Additional context
Add any other context about the problem here, such as Antrea logs, kubelet logs, etc.

(Please consider pasting long output into a GitHub gist or any other pastebin.)

@tnqn
Copy link
Member Author

tnqn commented Dec 11, 2019

Updating my plan to fix this:
The essential problem is, in current implementation egress enforcement must wait for Pods being up and running (having IP assigned) in kube-apiserver, while Pods might already be running when kubelet updates their status to kube-apiserver.

We can rely on the CNI ADD event to trigger the enforcement earlier, as the required data for policy enforcement (IP or OF port) would be ready after CNI ADD is processed.

In detail, we can use a channel, to which the CNIServer would send an added pod upon CNI ADD, from which the PolicyController would get an added pod and trigger processing all rules it would affect. Of course, we should add the Pod to AppliedToGroup as long as it's scheduled to a Node, instead of waiting for it getting an IP.

This approach is asynchronous but should be much faster than kubelet receiving CNI response and creating the workload containers and then starting them.

I'm working on the code, expecting to finish and test it by EOW.

@antoninbas
Copy link
Contributor

Thanks Quan, this sounds like something we can add to the 0.2.0 release.

For the longer term however, is this good enough or do we want something more "synchronous"? Should we instead try not to enable Pod networking until all "current" network policies (by "current" I mean network policies which predate the creation of the Pod by a non trivial amount of time)? Ideally maybe we should not enable forwarding for the Pod until we have a chance to enforce known network policies. It seems to me that the scope of such a change would not be much larger than what you are proposing.

@antoninbas antoninbas added this to the Antrea v0.2.0 release milestone Dec 11, 2019
@antoninbas
Copy link
Contributor

To clarify, my proposal is the following: do not install the Pod flows (i.e. hold off on calling InstallPodFlows) until the Network Policy controller tells the Agent it is ok to do so. I am not sure how significant of a change it is to the internal API (Controller <-> Agent) but I think it may be a better approach.

@abhiraut
Copy link
Contributor

If the cause of this issue is that the pod might be in running state before the controller gets the notification and agent creates flows for the pod, the same should be true for ingress traffic too correct? Why would egress policy only have this effect?

@tnqn
Copy link
Member Author

tnqn commented Dec 12, 2019

Thanks Quan, this sounds like something we can add to the 0.2.0 release.

For the longer term however, is this good enough or do we want something more "synchronous"? Should we instead try not to enable Pod networking until all "current" network policies (by "current" I mean network policies which predate the creation of the Pod by a non trivial amount of time)? Ideally maybe we should not enable forwarding for the Pod until we have a chance to enforce known network policies. It seems to me that the scope of such a change would not be much larger than what you are proposing.

@antoninbas thanks for the suggestion, I actually thought the same as you and mentioned to @jianjuns.
However, I found it would be more complicated than I imaged:

  1. It couples NetworkPolicyController and CNIServer, for each Pod, CNIServer needs a way to ensure NetworkPolicies of this Pod are enforced, which might introduce interfaces between them;
  2. The NetworkPolicyController reconciles Policy rules in parallel, there's no existing way to know whether Policy of a Pod have been enforced, and what if new related Policies come when realizing existing rules. It doesn't fit in current asynchronous model if we want to prioritize rules related to a Pod. If considering concurrent pod creation, it would be more complicated.
  3. If we do Add Antrea architecture doc #1 this synchronously, the time we processed CNI ADD might increase, hence slow down Pod creation.

More importantly, I realized asynchronous is very simple and safe enough to solve this issue: what the asynchronous approach does is just changing the trigger of rule processing from Pod status update in kube-apiserver to its internal CNI ADD event, nothing else changes. The time of receiving an item from a go channel and processing its rules in parallel is much shorter than the time of kubelet receiving CNI response from a grpc channel and then creating the real workload container and starting it.
According to my test, it took less than 300 us to finish the former, while kubelet didn't receive the response yet, and it would take several ms to start creating the workload container:

Logs of antrea-agent:

I1212 12:04:09.814435       1 server.go:360] Receive CmdAdd request cni_args:<container_id:"41918d18b29573431165a43181b02d00c19ad83df6f94273ee10b09afea61b57"
...
I1212 12:04:09.860067       1 server.go:417] CmdAdd request success
I1212 12:04:09.860173       1 reconciler.go:90] Reconciling rule 3b8d640e1b720086
I1212 12:04:09.860179       1 reconciler.go:90] Reconciling rule a243f0eb7070073d
I1212 12:04:09.860184       1 reconciler.go:204] Updating existing rule 3b8d640e1b720086 (Direction: Out, Pods: 1, ToAddressGroups: 0, ToIPBlocks: 1, ToAddresses: 0, Services: 1)
I1212 12:04:09.860202       1 reconciler.go:318] Got IP 10.30.1.98 for Pod network-policy-1651/client-a-2w2f6
I1212 12:04:09.860198       1 reconciler.go:204] Updating existing rule a243f0eb7070073d (Direction: Out, Pods: 1, ToAddressGroups: 0, ToIPBlocks: 1, ToAddresses: 0, Services: 0)
I1212 12:04:09.860213       1 reconciler.go:238] Updating ofRule 2 (Direction: Out, addedFrom: 1, addedTo: 0, deleteFrom: 0, deletedTo: 0)
I1212 12:04:09.860214       1 reconciler.go:318] Got IP 10.30.1.98 for Pod network-policy-1651/client-a-2w2f6
I1212 12:04:09.860223       1 reconciler.go:238] Updating ofRule 1 (Direction: Out, addedFrom: 1, addedTo: 0, deleteFrom: 0, deletedTo: 0)
I1212 12:04:09.860268       1 networkpolicy_controller.go:132] Finished syncing rule "3b8d640e1b720086". (98.894<C2><B5>s)
I1212 12:04:09.860320       1 networkpolicy_controller.go:132] Finished syncing rule "a243f0eb7070073d". (146.243<C2><B5>s)

Logs of kubelet:

Dec 12 04:04:09 dev4linux kubelet[5598]: I1212 04:04:09.514935    5598 kuberuntime_manager.go:704] Creating sandbox for pod "client-a-2w2f6_network-policy-1651(914feab8-0bb4-4ab9-9005-8f2e063877c2)"
...
Dec 12 04:04:09 dev4linux kubelet[5598]: I1212 04:04:09.807846    5598 plugins.go:405] Calling network plugin cni to set up pod "client-a-2w2f6_network-policy-1651"
Dec 12 04:04:09 dev4linux kubelet[5598]: I1212 04:04:09.861571    5598 kuberuntime_manager.go:718] Created PodSandbox "41918d18b29573431165a43181b02d00c19ad83df6f94273ee10b09afea61b57" for pod "client-a-2w2f6_network-policy-1651(914feab8-0bb4-4ab9-9005-8f2e063877c2)"
Dec 12 04:04:09 dev4linux kubelet[5598]: I1212 04:04:09.864524    5598 kuberuntime_manager.go:737] Determined the ip [10.30.1.98] for pod "client-a-2w2f6_network-policy-1651(914feab8-0bb4-4ab9-9005-8f2e063877c2)" after sandbox changed
Dec 12 04:04:09 dev4linux kubelet[5598]: I1212 04:04:09.864601    5598 kuberuntime_manager.go:774] Creating container &Container{Name:client-a-container

The related NetworkPolicies passed after applying the patch (actually user won't encounter the issue unless they design the scenario like the K8s tests by intention), I'm improving it and can push for review tomorrow.

@tnqn
Copy link
Member Author

tnqn commented Dec 12, 2019

If the cause of this issue is that the pod might be in running state before the controller gets the notification and agent creates flows for the pod, the same should be true for ingress traffic too correct? Why would egress policy only have this effect?

The ingress test would wait for server Pod's IP and create a client Pod to test it, if the testing code knows the Pod's IP, antrea knows it too and would already enforced the policy.
The egress test starts before the Pod's IP is updated in kube-apiserver, because kubelet starts the workload container first then updates the Pod's status to apiserver (or maybe in parallel), so if we rely on kube-apiserver's events, we can never enforce policies before the egress tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants