Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 'namespaces' in ACNP for enhanced peer namespace selection #1961

Merged
merged 5 commits into from
May 26, 2021

Conversation

Dyanngg
Copy link
Contributor

@Dyanngg Dyanngg commented Mar 16, 2021

The PR partially addresses #1941. It introduces a new namespaces field to ingress/egress peers for Antrea ClusterNetworkPolicy (in security/v1alpha1), while still keeping namespaceSelector as an optional field. For a given peer, only one of namespaces or namespaceSelector can be specified.

type NetworkPolicyPeer struct {
        ....
        // +optional
	PodSelector       *metav1.LabelSelector
        // +optional
	NamespaceSelector *metav1.LabelSelector
	// +optional
	Namespaces        *PeerNamespaces
        ....
}

type PeerNamespaces struct {
	Match NamespaceMatchType
}

// NamespaceMatchType describes Namespace matching strategy.
type NamespaceMatchType string

// This list can/might get expanded in the future (i.e. NotSelf etc.)
const (
	NamespaceMatchSelf NamespaceMatchType = "Self"
)

NamespacesSelector will have the exact same semantics as before. Self in peer.Namespaces, on the other hand, is a new special keyword to indicate that the corresponding podSelector should be evaluated from within the Namespace of the applied-to workload, for which the ingress/egress rule is currently affected. This enables policy writers to create per-Namespace rules. See #1941 for examples. Note: using ClusterGroups as appliedTo for per-Namespace rules is not yet supported.

The implementation is as follows: if there is at least one per-namespace rule in Antrea ClusterNetworkPolicy, then the internal NetworkPolicy object created by the controller will have appliedTo set per-rule, no matter how the ACNP is defined. This is because the appliedTo group for a per-namespace rule will potentially be split into multiple internal rules, depending on the Namespace resolution result.

For cluster-scoped peers (peers whose namespace.match is not set to Self), the appliedTo for its rule will be set as the spec.appliedTo, or rule.appliedTo, if the original ACNP is appliedTo-per-rule. For per-namespace peers, an internal rule will be created by each Namespace which its appliedTo resolves to, with appliedTo and peer for that rule set at Namespace scope.

Taking the sample spec as an example:

apiVersion: "crd.antrea.io/v1alpha2"
kind: ClusterNetworkPolicy
metadata:
    name: ns-isolation
spec:
    tier: baseline
    priority: 5
    appliedTo:
    // Select all Namespaces
    - namespaceSelector: {}
    ingress:       
    - from: 
      // Allow ingress traffic from all Pods from AppliedTo Pod's own Namespace                          
      action: Allow
      - namespaces:
          match: self
    - from: 
      // Deny ingress traffic from all Pods from all other sources
      action: Deny

Suppose there are 2 namespaces, ns1 and ns2 in the cluster. Then for the 2 ingress rules above, 3 internal rules will be created:

1. appliedToForRule=<all Pods in ns1>, from=<all Pods in ns1>, action=Allow, rulePriority=0
2. appliedToForRule=<all Pods in ns2>, from=<all Pods in ns2>, action=Allow, rulePriority=0
3. appliedToForRule=<all Pods in all namespaces>, from=<all Pods in all namespaces>, action=Deny rulePriority=1

Three individual appliedToGroups and three individual addressGroups will be created in this case.

@Dyanngg Dyanngg added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 16, 2021
@Dyanngg Dyanngg added this to the Antrea v0.14.0 release milestone Mar 16, 2021
@Dyanngg Dyanngg self-assigned this Mar 16, 2021
@codecov-io
Copy link

codecov-io commented Mar 16, 2021

Codecov Report

Merging #1961 (8016094) into main (46a2fc5) will decrease coverage by 23.58%.
The diff coverage is 48.63%.

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #1961       +/-   ##
===========================================
- Coverage   65.40%   41.81%   -23.59%     
===========================================
  Files         197      124       -73     
  Lines       17192    15367     -1825     
===========================================
- Hits        11244     6426     -4818     
- Misses       4778     8411     +3633     
+ Partials     1170      530      -640     
Flag Coverage Δ
kind-e2e-tests ?
unit-tests 41.81% <48.63%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ntroller/networkpolicy/networkpolicy_controller.go 55.40% <0.00%> (-29.21%) ⬇️
pkg/controller/networkpolicy/validate.go 0.00% <0.00%> (-71.57%) ⬇️
...kg/controller/networkpolicy/store/networkpolicy.go 71.64% <42.85%> (-13.36%) ⬇️
...g/controller/networkpolicy/clusternetworkpolicy.go 58.96% <58.68%> (-28.18%) ⬇️
pkg/controller/networkpolicy/crd_utils.go 87.20% <75.00%> (-1.40%) ⬇️
pkg/agent/agent_linux.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/ovs/openflow/default.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/agent/config/node_config.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/apis/controlplane/helper.go 0.00% <0.00%> (-100.00%) ⬇️
pkg/agent/cniserver/pod_configuration_linux.go 0.00% <0.00%> (-100.00%) ⬇️
... and 159 more

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is the approach, K8s upstream NetworkPolicy will go?

pkg/apis/security/v1alpha1/types.go Outdated Show resolved Hide resolved
@Dyanngg
Copy link
Contributor Author

Dyanngg commented Mar 17, 2021

So, this is the approach, K8s upstream NetworkPolicy will go?

Yes at least this is what we are proposing

@Dyanngg Dyanngg force-pushed the acnp-namespaces branch 4 times, most recently from 10e42b4 to 127f790 Compare March 18, 2021 22:11
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial comments after taking a quick look

@antoninbas
Copy link
Contributor

@Dyanngg I don't recall if this was discussed previously, but does the PR handle stats aggregation when you create individual rules with their own appliedTo for different namespaces? From the user perspective there is a single rule and it should include stats from all the generated rules. Would be good to have an e2e test case for this as well.

@Dyanngg
Copy link
Contributor Author

Dyanngg commented Mar 22, 2021

@Dyanngg I don't recall if this was discussed previously, but does the PR handle stats aggregation when you create individual rules with their own appliedTo for different namespaces? From the user perspective there is a single rule and it should include stats from all the generated rules. Would be good to have an e2e test case for this as well.

This change should be compatible with rule stats because the controller aggregates rule stats based on rule name, and each per-namespace rule created for the original ingress/egress rule will share the same rule name. /cc @ceclinux

@Dyanngg Dyanngg requested a review from ceclinux March 22, 2021 23:14
// is created for ClusterNetworkPolicy ingress/egress rules.
// Cannot be set with NamespaceSelector.
// +optional
Namespaces *PeerNamespaces `json:"namespaces,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this option has been discussed before: do we consider adding a field SelfNamespace directly? which is just another way to cooperate with PodSelector and cannot be set together with NamespaceSelector to avoid another sub-struct and the need to deal with the upgrade/deprecation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular option has not been discussed I believe, and I agree it might make upgrade a bit easier. For upstream ClusterNetworkPolicy proposal it does not make a big difference because those are new API types to be added anyways. If everyone like the SelfNamespace: bool approach, I can make the change for this PR and propose the same thing upstream. I think it's perfectly okay if the upstream spec came out a bit different than what we merge in Antrea as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR based on k8s-sig-network upstream KEP comments. Instead of a bool field, changed the Self keyword to a NamespaceMatchType for it to be more extensible, as well as keeping the old namespaceSelector semantics exactly as-is.

@Dyanngg Dyanngg force-pushed the acnp-namespaces branch 2 times, most recently from 8016094 to abb039b Compare March 24, 2021 00:27
@Dyanngg Dyanngg removed this from the Antrea v1.0 release milestone Apr 7, 2021
@Dyanngg Dyanngg added this to the Antrea v1.1 release milestone May 13, 2021
@Dyanngg Dyanngg force-pushed the acnp-namespaces branch 2 times, most recently from 60365d1 to 042b4a4 Compare May 13, 2021 07:20
pkg/controller/types/networkpolicy.go Show resolved Hide resolved
pkg/controller/types/networkpolicy.go Outdated Show resolved Hide resolved
pkg/controller/types/networkpolicy.go Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented May 13, 2021

Codecov Report

Merging #1961 (291ba89) into main (c21592c) will decrease coverage by 6.21%.
The diff coverage is 64.59%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1961      +/-   ##
==========================================
- Coverage   61.63%   55.42%   -6.22%     
==========================================
  Files         274      274              
  Lines       20673    20817     +144     
==========================================
- Hits        12742    11537    -1205     
- Misses       6593     8059    +1466     
+ Partials     1338     1221     -117     
Flag Coverage Δ
kind-e2e-tests 40.65% <6.69%> (-11.96%) ⬇️
unit-tests 41.17% <59.33%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/controller/networkpolicy/validate.go 0.00% <0.00%> (-69.47%) ⬇️
pkg/controller/types/networkpolicy.go 100.00% <ø> (ø)
...kg/controller/networkpolicy/store/networkpolicy.go 77.61% <42.85%> (-7.39%) ⬇️
...g/controller/networkpolicy/clusternetworkpolicy.go 60.61% <61.14%> (-26.53%) ⬇️
pkg/controller/networkpolicy/crd_utils.go 89.15% <100.00%> (-1.99%) ⬇️
...ntroller/networkpolicy/networkpolicy_controller.go 78.23% <100.00%> (-6.34%) ⬇️
pkg/apis/controlplane/helper.go 25.00% <0.00%> (-75.00%) ⬇️
pkg/apis/controlplane/v1beta2/helper.go 25.00% <0.00%> (-75.00%) ⬇️
pkg/controller/networkpolicy/mutate.go 0.00% <0.00%> (-71.77%) ⬇️
pkg/controller/networkpolicy/store/group.go 5.00% <0.00%> (-70.00%) ⬇️
... and 69 more

@Dyanngg
Copy link
Contributor Author

Dyanngg commented May 14, 2021

/test-all

Copy link
Contributor

@jianjuns jianjuns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIts.

pkg/controller/networkpolicy/crd_utils.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicy/crd_utils.go Outdated Show resolved Hide resolved
@Dyanngg Dyanngg force-pushed the acnp-namespaces branch 3 times, most recently from 2bc2c50 to 0d4af93 Compare May 19, 2021 22:23
klog.V(2).Infof("Processing Namespace %s ADD event, labels: %v", namespace.Name, namespace.Labels)
affectedACNPs := n.filterPerNamespaceRuleACNPsByNSLabels(namespace.Labels)
for cnpName := range affectedACNPs {
if cnp, err := n.cnpLister.Get(cnpName); err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the second question in #1961 (comment)?

Can it just enqueue the policy to notify it needs to be processed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I think I missed understood that comment. I thought you were suggesting we enqueue the NP which is already done. The problem is that simply enqueuing the internal NP is not enough... syncInternalNetworkPolicy only updates the NP span, but for Namespace update events, we need the CNP to be re-processed because the cardinality of per-NS rules might have changed because of Namespace label updates

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if using updateCNP here would introduce some problems: the function was supposed to be called serially and with continuous state changes, but now it may be called concurrently and with some unexpected intermediate state.
For example, before the change it's only possible to receive:

update(A1, A2) => update(A2, A3)

After the change it's possible to receive:

update(A1, A2) => update(A3, A3) => update(A2, A3)

And update(A3, A3) may be executed at the same as others.
In a corner case, it may add a deleted CNP back after deleteCNP deletes it.

Copy link
Contributor Author

@Dyanngg Dyanngg May 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think update(A3, A3) executed with update(A2, A3) will cause any unexpected behaviors. If you look at updateCNP(), the old CNP is merely used to retrieve the key of the original internalNP, which in turn is only used to preserve span; in addition, the key of the original internalNP is the same as the new CNP, so the oldCNP spec is effectively worthless tbh...

Adding a deleted CNP back does not seem possible to me as well, since before we call update we use the cnpLister to check if the CNP corresponded to the internalNP is still 'alive'.

Nevertheless, for clarity I added a new reprocessCNP func that does not involve update(CNP, CNP) and checks for internal NP existence before re-processing CNP.

Copy link
Member

@tnqn tnqn May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a deleted CNP back is possible when the DELETE event is received and Delete(A3) is executed between the existence check and update(A3, A3) as there is no mutex to prevent it from happening. There are many yield points in update(A3, A3) too that can increase the possibility.
Adding a mutex and internal NP existence check may avoid it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex and existence check has been added in the latest commit.

pkg/controller/networkpolicy/clusternetworkpolicy.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicy/clusternetworkpolicy.go Outdated Show resolved Hide resolved
@Dyanngg
Copy link
Contributor Author

Dyanngg commented May 24, 2021

/test-all

@Dyanngg Dyanngg force-pushed the acnp-namespaces branch 2 times, most recently from c1672a6 to 291ba89 Compare May 25, 2021 19:27
@Dyanngg Dyanngg requested a review from tnqn May 25, 2021 19:28
@Dyanngg
Copy link
Contributor Author

Dyanngg commented May 25, 2021

/test-all

@Dyanngg
Copy link
Contributor Author

Dyanngg commented May 26, 2021

/test-all

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dyanngg Dyanngg merged commit e7051c0 into antrea-io:main May 26, 2021
@Dyanngg Dyanngg deleted the acnp-namespaces branch May 26, 2021 18:48
@@ -2009,6 +2019,11 @@ spec:
type: string
matchLabels:
x-kubernetes-preserve-unknown-fields: true
namespaces:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason why we add this in the deprecated version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really... I guess it isn't necessary. This PR was opened before the security api group was deprecated, and the API change was meant for this version of ACNP resource.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants