-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support NetworkPolicy Status for AntreaPolicy #1442
Conversation
Thanks for your PR. The following commands are available:
|
Codecov Report
@@ Coverage Diff @@
## master #1442 +/- ##
==========================================
+ Coverage 62.50% 63.24% +0.73%
==========================================
Files 167 170 +3
Lines 13969 14250 +281
==========================================
+ Hits 8732 9012 +280
+ Misses 4325 4300 -25
- Partials 912 938 +26
Flags with carried forward coverage won't be shown. Click here to find out more.
|
3309116
to
81a7cbe
Compare
Thanks for your PR. The following commands are available:
|
e9220e0
to
4c6a829
Compare
/test-all |
/test-all |
@@ -115,6 +115,11 @@ func run(o *Options) error { | |||
appliedToGroupStore, | |||
networkPolicyStore) | |||
|
|||
var networkPolicyStatusController *networkpolicy.StatusController | |||
if features.DefaultFeatureGate.Enabled(features.AntreaPolicy) { | |||
networkPolicyStatusController = networkpolicy.NewStatusController(crdClient, networkPolicyStore, cnpInformer, anpInformer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we discussed this before and decided to reuse the AntreaPolicy gate. But do you think realization status will introduce much overhead or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it has expensive CPU overhead. On each Node, say 500 Policies applied to it, it's just 500*k times of int comparison. On antrea-controller, say 10000 Policies in total, 100 NodeStatus per Policy, it's 1,000,000 times of simple comparison and int increment even after restarting the controller, I feel it's something can be done in a few seconds, or maybe less. I could write a benchmark for the syncHandler later.
I'm not very sure the overhead of the network between agent and controller, though the message is small but needs timely update, not sure how the apiserver perform in this situation. I will have a scale test and paste the result here.
The network overhead between controller and kube-apiserver should be small, as we only update when the status changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added benchmark tests for syncHandlers. On controller side, sync a policy that spans 1000 Nodes only takes 70us (networking call is not counted), so even 10,000 policies with such span take only 700ms for calculation. On agent side, sync a policy that has 100 rules takes 47us(networking call is not counted), so 500 policies take only 24ms for calculation. Do they eliminate your concern about overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These numbers look not bad. What you think about memory?
BTW, I assume you will capture your benchmark numbers somewhere for reference later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For antrea-controller, it adds below map, each policy status is about (20+8) bytes, 10,000 policies, 100 nodes per policy may take 28MB, taking the length of keys and the struct of hash table, maybe around 80MB?
// statuses is a nested map that keeps the realization statuses reported by antrea-agents.
// The outer map's keys are the NetworkPolicy keys. The inner map's keys are the Node names. The inner map's values
// are statuses reported by each Node for a NetworkPolicy.
statuses map[string]map[string]*controlplane.NetworkPolicyNodeStatus
For antrea-agent, it adds below a storage of below struct, each of which takes 16+36 bytes. 500 policies, 10 rules per policy may take 260KB, the whole storage should use less than 1MB.
type realizedRule struct {
ruleID string
policyID types.UID
}
The benchmark number is in the comments of their tests.
c.ruleGenerationsLock.RLock() | ||
defer c.ruleGenerationsLock.RUnlock() | ||
for _, rule := range rules { | ||
ruleGeneration, exists := c.ruleGenerations[rule.ID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - could we save rule generation in ruleCache, then we need not a separate map of ruleGenerations?
And could we maintain a counter of rules match the expected generation, then we need not to check all rules here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first question, we already save rule generation in ruleCache, but it's desired one, we need a place to store the realized one. Or you mean adding another field to rule struct for realized generation?
For the second question, a rule can be realized multiple times if its associated groups are changed, we cannot just increase the counter when a rule is realized and need to know whether it has been realized before, maybe we could use a set to maintain rules that match the expectation? combining your first question, maybe the struct could be:
type policyStatus struct {
expectedGeneration int64
expectedRuleNum int
realizedRuleSet sets.String
}
policyStatusMap map[string]policyStatus
So when calculate the status to report, it just checks expectedRuleNum == len(realizedRuleSet)
, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I meant a separate field for the realized generation in the rule struct.
a rule can be realized multiple times if its associated groups are changed, we cannot just increase the counter when a rule is realized and need to know whether it has been realized before.
If we save the realized generation in the rule struct, we can check the current value of the field to decide whether or not to increment the counter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized the rule actually shouldn't be generation related because rules are immutable, otherwise changing rule B could cause rule A updated unnecessarily (if counting policy generation into the hash value). When we made a change to an existing rule, it's considered a new rule added and an old rule deleted. The policy should be considered realized when all of its current rules are realized and all of its removed rules are uninstalled. Changed the implementation in latest patch.
if np.SourceRef.Type == controlplane.K8sNetworkPolicy { | ||
continue | ||
} | ||
c.queue.Add(np.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it true we only care about span changes for an update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we also care about generation change, in which case the observedGeneration and phase should be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment to indicate what changes we care about here?
b2e17a1
to
c7887d7
Compare
8d707dd
to
fe2f724
Compare
for _, r := range actualRules { | ||
actualRuleSet.Insert(r.(*realizedRule).ruleID) | ||
} | ||
if !desiredRuleSet.Equal(actualRuleSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it faster than lookup and compare desiredRuleSet for every actaulRule in line 182?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great suggestion. Improvement as below:
Before:
47754 ns/op 15320 B/op 23 allocs/op
After:
37734 ns/op 10088 B/op 15 allocs/op
func (c *StatusController) updateCNP(old, cur interface{}) { | ||
curCNP := cur.(*secv1alpha1.ClusterNetworkPolicy) | ||
oldCNP := old.(*secv1alpha1.ClusterNetworkPolicy) | ||
if oldCNP.Status == curCNP.Status { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why watch CNP/ANP changes and check their Status here? Is it for the case some other clients change the Status field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not designed for the case other clients changing the status field but can work for it.
The main consideration is we need a source of truth of the status so that we can avoid doing dummy CRD update call.
There are two options: get it before updating, which is apparently not worth and doesn't reduce API call number.
Another option is the CRD Lister we already have for networkpolicycontroller. However, Lister is not always in sync with K8s API. For example, if a policy's CurrentNodesRealized is changed from 2 -> 3 -> 2 quickly, in the first round, we want to update it to 3 and found it's 2 in Lister so we will call the update API, in the second round, we want to update it to 2 and found it's also 2 in Lister (it's very possible because there is some delay for the first update to be reflected in Lister) then we won't call the update API, causing the status in API kept 3 wrongly.
Since Lister is our source of truth of status, if it is changed, we need a resync for the status. Also explained it in comments L82-L85.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we compare with the internal state instead of CRD Status to know if an update is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By internal state, do you mean a "desired" status of CR's status? We don't have a separate storage for that and use Lister as the source of truth of data in apiserver. I guess we need to maintain the internal state in the end of syncHandler and use it in update eventhandler to prevent unnecessary tasks enqueued?
I'm not sure if it could solve the above problem: in the second round of above example, if controller hasn't received CR update event from 2 to 3, syncHandler that wants to update CR status from 3 to 2 won't make update API call because CR status in Lister is already 2. If the update event comes before it updates the internal state from 3 to 2, the eventhandler will also skip enqueuing because it thinks the latest CR status 3 matches the internal state 3. or does updating internal state before making API call prevent this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is not prevent tasks enqueued, but to prevent unnecessary K8s calls, and we should check the desired status in syncHandler before updating CR status.
But I think you still mean you want to cover the case the status is updated by other clients, so you want to watch CR status change here?
I think the problem is that we are not able to append information to the event (like it is status change, or span change, or generation change), so we probably have a lot unnecessary computation.
But anyway would you add some comments to explain the reasoning of handling status changes (in my mind if not to consider other clients, we can check the last desired state in syncHandler)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my point about internal status vs CR status is we already have the latter. The former will be kind of redundant with it and cannot solve the restart case and the other client's updates case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just do not like the fact very CR status update will trigger a new event -> status recomputation. In a sense we double the computation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that it needs to compute it another time after a successful update. This is also an inefficient part of most K8s controllers.
I could add a storage to maintain the expected status of CR, then use it in eventHandler, only enqueue the policy when the new status doesn't match the expected status, what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about let us merge the current implementation first, then think about optimization?
I just hope you can describe these design considerations for us (probably mainly me as you know the code well) to track and understand later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added more comments and TODO to L82-L88 to follow up.
896f58e
to
2aefdb2
Compare
/test-all |
72d445b
to
8d21baa
Compare
/test-all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two nits.
This patch does the following to support NetworkPolicy Status: 1. Add "status" field to Antrea ClusterNetworkPolicy and Antrea NetworkPolicy CRD 2. Add subresource API "status" to controlplane NetworkPolicy API 3. Each antrea-agent reports controlplane NetworkPolicies' realization status on its own Node to antrea-controller. 4. antrea-controller calculates the aggregated status and syncs it with kube-apiserver.
/test-all |
This patch does the following to support NetworkPolicy Status:
Add "status" field to Antrea ClusterNetworkPolicy and Antrea NetworkPolicy CRD.
Add subresource API "status" to controlplane NetworkPolicy API.
Each antrea-agent reports controlplane NetworkPolicies' realization status on its own Node to antrea-controller.
antrea-controller calculates the aggregated status and syncs it with kube-apiserver.
Closes #1257