Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 1.0.1 #2139

Merged
merged 9 commits into from
Apr 30, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build_tag.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
ref: refs/heads/main
workflow: Build Antrea ARM images and push manifest
token: ${{ secrets.ANTREA_BUILD_INFRA_WORKFLOW_DISPATCH_PAT }}
inputs: ${{ format('{{ "antrea-repository":"vmware-tanzu/antrea", "antrea-ref":"{0}", "docker-tag":"{1}" }}', github.ref, steps.get-version.outputs.version) }}
inputs: ${{ format('{{ "antrea-repository":"vmware-tanzu/antrea", "antrea-ref":"{0}", "docker-tag":"{1}" }}', github.ref, needs.get-version.outputs.version) }}

build-windows:
runs-on: [windows-2019]
Expand Down
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,19 @@ Some experimental features can be enabled / disabled using [Feature Gates](docs/

## Unreleased

## 1.0.1 - 2021-04-29

### Fixed

- It was discovered that the AntreaProxy implementation has an upper-bound for the number of Endpoints it can support for each Service: we increase this upper-bound from ~500 to 800, log a warning for Services with a number of Endpoints greater than 800, and arbitrarily drop some Endpoints so we can still provide load-balancing for the Service. ([#2101](https://github.com/vmware-tanzu/antrea/pull/2101), [@hongliangl])
- Fix Antrea-native policy with multiple AppliedTo selectors: some rules were never realized by the Agents as they thought they had only received partial information from the Controller. ([#2084](https://github.com/vmware-tanzu/antrea/pull/2084), [@tnqn])
- Fix re-installation of the OpenFlow groups when the OVS daemons are restarted to ensure that AntreaProxy keeps functioning. ([#2134](https://github.com/vmware-tanzu/antrea/pull/2134), [@antoninbas])
- Fix IPFIX flow records exported by the Antrea Agent. ([#2089](https://github.com/vmware-tanzu/antrea/pull/2089), [@zyiou])
* If a connection spanned multiple export cycles, it wasn't handled properly and no record was sent for the connection
* If a connection spanned a single export cycle, a single record was sent but "delta counters" were set to 0 which caused flow visualization to omit the flow in dashboards
- Fix incorrect stats reporting for ingress rules of some NetworkPolicies: some types of traffic were bypassing the OVS table keeping track of statistics once the connection was established, causing packet and byte stats to be incorrect. ([#2078](https://github.com/vmware-tanzu/antrea/pull/2078), [@ceclinux])
- Fix the retry logic when enabling the OVS bridge local interface on Windows Nodes. ([#2081](https://github.com/vmware-tanzu/antrea/pull/2081), [@antoninbas]) [Windows]

## 1.0.0 - 2021-04-09

Includes all the changes from [0.13.1].
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v1.0.0
v1.0.1
2 changes: 1 addition & 1 deletion build/yamls/elk-flow-collector/elk-flow-collector.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ spec:
- name: ELASTICSEARCH_URL
value: "http://elasticsearch:9200"
- name: KIBANA_DEFAULTAPPID
value: "dashboard/653cf1e0-2fd2-11e7-99ed-49759aed30f5"
value: "dashboard/3b331b30-b987-11ea-b16e-fb06687c3589"
- name: LOGGING_QUIET
value: "true"
ports:
Expand Down
120 changes: 60 additions & 60 deletions build/yamls/elk-flow-collector/kibana.ndjson

Large diffs are not rendered by default.

10 changes: 6 additions & 4 deletions build/yamls/elk-flow-collector/logstash/filter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
# register accepts the hashmap passed to "script_params"
# it runs once at startup
def register(params)
@interval = params["interval"]
@@time_map = Hash.new
end

Expand Down Expand Up @@ -93,9 +92,12 @@ def filter(event)
event.set("[ipfix][reverseThroughput]", event.get("[ipfix][reverseOctetDeltaCountFromSourceNode]").to_i / duration.to_i)
@@time_map[key] = t
else
@@time_map[key] = DateTime.strptime(event.get("[ipfix][flowEndSeconds]").to_s, '%Y-%m-%dT%H:%M:%S').to_time.to_i
event.set("[ipfix][throughput]", event.get("[ipfix][octetDeltaCountFromSourceNode]").to_i / @interval.to_i)
event.set("[ipfix][reverseThroughput]", event.get("[ipfix][reverseOctetDeltaCountFromSourceNode]").to_i / @interval.to_i)
startTime = DateTime.strptime(event.get("[ipfix][flowStartSeconds]").to_s, '%Y-%m-%dT%H:%M:%S').to_time.to_i
endTime = DateTime.strptime(event.get("[ipfix][flowEndSeconds]").to_s, '%Y-%m-%dT%H:%M:%S').to_time.to_i
duration = endTime-startTime
event.set("[ipfix][throughput]", event.get("[ipfix][octetDeltaCountFromSourceNode]").to_i / duration.to_i)
event.set("[ipfix][reverseThroughput]", event.get("[ipfix][reverseOctetDeltaCountFromSourceNode]").to_i / duration.to_i)
@@time_map[key] = endTime
end
return [event]
end
1 change: 0 additions & 1 deletion build/yamls/elk-flow-collector/logstash/logstash.conf
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ input {
filter {
ruby {
path => "/usr/share/logstash/config/filter.rb"
script_params => { "interval" => 60 }
}
}

Expand Down
6 changes: 5 additions & 1 deletion docs/feature-gates.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,11 @@ example, to enable `AntreaProxy` on Linux, edit the Agent configuration in the
`AntreaProxy` implements Service load-balancing for ClusterIP Services as part
of the OVS pipeline, as opposed to relying on kube-proxy. This only applies to
traffic originating from Pods, and destined to ClusterIP Services. In
particular, it does not apply to NodePort Services.
particular, it does not apply to NodePort Services. Please note that due to
some restrictions on the implementation of Services in Antrea, the maximum
number of Endpoints that Antrea can support at the moment is 800. If the
number of Endpoints for a given Service exceeds 800, extra Endpoints will
be dropped.

Note that this feature must be enabled for Windows. The Antrea Windows YAML
manifest provided as part of releases enables this feature by default. If you
Expand Down
7 changes: 0 additions & 7 deletions docs/network-flow-visibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -389,13 +389,6 @@ or
svn export https://github.com/vmware-tanzu/antrea/trunk/build/yamls/elk-flow-collector/
```

To configure the export interval as `flowExportInterval` in [Configuration](#configuration),
modify the `interval` value (in seconds) in `elk-flow-collector/logstash/logstash.conf`

```conf
script_params => { "interval" => 60 }
```

To create the required K8s resources in the `elk-flow-collector` folder and get
everything up-and-running, run following commands:

Expand Down
8 changes: 7 additions & 1 deletion pkg/agent/agent.go
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,12 @@ func persistRoundNum(num uint64, bridgeClient ovsconfig.OVSBridgeClient, interva
// agent restarts (with the agent crashing before step 4 can be completed). With the sequence
// described above, We guarantee that at most two rounds of flows exist in the switch at any given
// time.
// Note that at the moment we assume that all OpenFlow groups are deleted every time there is an
// Antrea Agent restart. This allows us to add the necessary groups without having to worry about
// the operation failing because a (stale) group with the same ID already exists in OVS. This
// assumption is currently guaranteed by the ofnet implementation:
// https://github.com/wenyingd/ofnet/blob/14a78b27ef8762e45a0cfc858c4d07a4572a99d5/ofctrl/fgraphSwitch.go#L57-L62
// All previous groups have been deleted by the time the call to i.ofClient.Initialize returns.
func (i *Initializer) initOpenFlowPipeline() error {
roundInfo := getRoundInfo(i.ovsBridgeClient)

Expand Down Expand Up @@ -424,7 +430,7 @@ func (i *Initializer) FlowRestoreComplete() error {
if err != nil {
if err == wait.ErrWaitTimeout {
// This could happen if the method is triggered by OVS disconnection event, in which OVS doesn't restart.
klog.Info("flow-restore-wait was not true, skip cleaning up it")
klog.Info("flow-restore-wait was not true, skip cleaning it up")
return nil
}
return err
Expand Down
75 changes: 49 additions & 26 deletions pkg/agent/controller/networkpolicy/cache.go
Original file line number Diff line number Diff line change
Expand Up @@ -223,22 +223,32 @@ func (c *ruleCache) getAppliedNetworkPolicies(pod, namespace string, npFilter *q
return policies
}

func (c *ruleCache) getRule(ruleID string) (*rule, bool) {
obj, exists, _ := c.rules.GetByKey(ruleID)
if !exists {
return nil, false
}
return obj.(*rule), true
}

func (c *ruleCache) getRulesByNetworkPolicy(uid string) []*rule {
func (c *ruleCache) getEffectiveRulesByNetworkPolicy(uid string) []*rule {
objs, _ := c.rules.ByIndex(policyIndex, uid)
if len(objs) == 0 {
return nil
}
rules := make([]*rule, len(objs))
for i, obj := range objs {
rules[i] = obj.(*rule)
rules := make([]*rule, 0, len(objs))

// A rule is considered effective when any of its AppliedToGroups can be populated.
isEffective := func(r *rule) bool {
for _, g := range r.AppliedToGroups {
_, exists := c.appliedToSetByGroup[g]
if exists {
return true
}
}
return false
}

c.appliedToSetLock.RLock()
defer c.appliedToSetLock.RUnlock()

for _, obj := range objs {
rule := obj.(*rule)
if isEffective(rule) {
rules = append(rules, rule)
}
}
return rules
}
Expand Down Expand Up @@ -524,13 +534,13 @@ func (c *ruleCache) PatchAppliedToGroup(patch *v1beta.AppliedToGroupPatch) error
}

// DeleteAppliedToGroup deletes a cached *v1beta.AppliedToGroup.
// It should only happen when a group is no longer referenced by any rule, so
// no need to mark dirty rules.
// It may be called when a rule becomes ineffective, so it needs to mark dirty rules.
func (c *ruleCache) DeleteAppliedToGroup(group *v1beta.AppliedToGroup) error {
c.appliedToSetLock.Lock()
defer c.appliedToSetLock.Unlock()

delete(c.appliedToSetByGroup, group.Name)
c.onAppliedToGroupUpdate(group.Name)
return nil
}

Expand Down Expand Up @@ -700,16 +710,33 @@ func (c *ruleCache) deleteNetworkPolicyLocked(uid string) error {
}

// GetCompletedRule constructs a *CompletedRule for the provided ruleID.
// If the rule is not found or not completed due to missing group data,
// the return value will indicate it.
func (c *ruleCache) GetCompletedRule(ruleID string) (completedRule *CompletedRule, exists bool, completed bool) {
// If the rule is not effective or not realizable due to missing group data, the return value will indicate it.
// A rule is considered effective when any of its AppliedToGroups can be populated.
// A rule is considered realizable when it's effective and all of its AddressGroups can be populated.
// When a rule is not effective, it should be removed from the datapath.
// When a rule is effective but not realizable, the caller should wait for it being realizable before doing anything.
// When a rule is effective and realizable, the caller should realize it.
// This is because some AppliedToGroups in a rule might never be sent to this Node if one of the following is true:
// - The original policy has multiple AppliedToGroups and some AppliedToGroups' span does not include this Node.
// - The original policy is appliedTo-per-rule, and some of the rule's AppliedToGroups do not include this Node.
// - The original policy is appliedTo-per-rule, none of the rule's AppliedToGroups includes this Node, but some other rules' (in the same policy) AppliedToGroups include this Node.
// In these cases, it is not guaranteed that all AppliedToGroups in the rule will eventually be present in the cache.
// Only the AppliedToGroups whose span includes this Node will eventually be received.
func (c *ruleCache) GetCompletedRule(ruleID string) (completedRule *CompletedRule, effective bool, realizable bool) {
obj, exists, _ := c.rules.GetByKey(ruleID)
if !exists {
return nil, false, false
}

r := obj.(*rule)

groupMembers, anyExists := c.unionAppliedToGroups(r.AppliedToGroups)
if !anyExists {
return nil, false, false
}

var fromAddresses, toAddresses v1beta.GroupMemberSet
var completed bool
if r.Direction == v1beta.DirectionIn {
fromAddresses, completed = c.unionAddressGroups(r.From.AddressGroups)
} else {
Expand All @@ -719,11 +746,6 @@ func (c *ruleCache) GetCompletedRule(ruleID string) (completedRule *CompletedRul
return nil, true, false
}

groupMembers, completed := c.unionAppliedToGroups(r.AppliedToGroups)
if !completed {
return nil, true, false
}

completedRule = &CompletedRule{
rule: r,
FromAddresses: fromAddresses,
Expand Down Expand Up @@ -771,20 +793,21 @@ func (c *ruleCache) unionAddressGroups(groupNames []string) (v1beta.GroupMemberS
}

// unionAppliedToGroups gets the union of pods of the provided appliedTo groups.
// If any group is not found, nil and false will be returned to indicate the
// set is not complete yet.
// If any group is found, the union and true will be returned. Otherwise an empty set and false will be returned.
func (c *ruleCache) unionAppliedToGroups(groupNames []string) (v1beta.GroupMemberSet, bool) {
c.appliedToSetLock.RLock()
defer c.appliedToSetLock.RUnlock()

anyExists := false
set := v1beta.NewGroupMemberSet()
for _, groupName := range groupNames {
curSet, exists := c.appliedToSetByGroup[groupName]
if !exists {
klog.V(2).Infof("AppliedToGroup %v was not found", groupName)
return nil, false
continue
}
anyExists = true
set = set.Union(curSet)
}
return set, true
return set, anyExists
}
51 changes: 42 additions & 9 deletions pkg/agent/controller/networkpolicy/cache_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -771,12 +771,24 @@ func TestRuleCacheGetCompletedRule(t *testing.T) {
From: v1beta2.NetworkPolicyPeer{AddressGroups: []string{"addressGroup1", "addressGroup2", "addressGroup3"}},
AppliedToGroups: []string{"appliedToGroup1", "appliedToGroup2"},
}
rule4 := &rule{
ID: "rule4",
Direction: v1beta2.DirectionIn,
From: v1beta2.NetworkPolicyPeer{AddressGroups: []string{"addressGroup1", "addressGroup2"}},
AppliedToGroups: []string{"appliedToGroup1", "appliedToGroup2", "appliedToGroup3"},
}
rule5 := &rule{
ID: "rule5",
Direction: v1beta2.DirectionIn,
From: v1beta2.NetworkPolicyPeer{AddressGroups: []string{"addressGroup1", "addressGroup2"}},
AppliedToGroups: []string{"appliedToGroup3", "appliedToGroup4"},
}
tests := []struct {
name string
args string
wantCompletedRule *CompletedRule
wantExists bool
wantCompleted bool
wantEffective bool
wantRealizable bool
}{
{
"one-group-rule",
Expand All @@ -803,15 +815,34 @@ func TestRuleCacheGetCompletedRule(t *testing.T) {
true,
},
{
"incompleted-rule",
"missing-one-addressgroup-rule",
rule3.ID,
nil,
true,
false,
},
{
"missing-one-appliedtogroup-rule",
rule4.ID,
&CompletedRule{
rule: rule4,
FromAddresses: addressGroup1.Union(addressGroup2),
ToAddresses: nil,
TargetMembers: appliedToGroup1.Union(appliedToGroup2),
},
true,
true,
},
{
"missing-all-appliedtogroups-rule",
rule5.ID,
nil,
false,
false,
},
{
"non-existing-rule",
"rule4",
"rule6",
nil,
false,
false,
Expand All @@ -827,16 +858,18 @@ func TestRuleCacheGetCompletedRule(t *testing.T) {
c.rules.Add(rule1)
c.rules.Add(rule2)
c.rules.Add(rule3)
c.rules.Add(rule4)
c.rules.Add(rule5)

gotCompletedRule, gotExists, gotCompleted := c.GetCompletedRule(tt.args)
gotCompletedRule, gotEffective, gotRealizable := c.GetCompletedRule(tt.args)
if !reflect.DeepEqual(gotCompletedRule, tt.wantCompletedRule) {
t.Errorf("GetCompletedRule() gotCompletedRule = %v, want %v", gotCompletedRule, tt.wantCompletedRule)
}
if gotExists != tt.wantExists {
t.Errorf("GetCompletedRule() gotExists = %v, want %v", gotExists, tt.wantExists)
if gotEffective != tt.wantEffective {
t.Errorf("GetCompletedRule() gotEffective = %v, want %v", gotEffective, tt.wantEffective)
}
if gotCompleted != tt.wantCompleted {
t.Errorf("GetCompletedRule() gotCompleted = %v, want %v", gotCompleted, tt.wantCompleted)
if gotRealizable != tt.wantRealizable {
t.Errorf("GetCompletedRule() gotRealizable = %v, want %v", gotRealizable, tt.wantRealizable)
}
})
}
Expand Down
22 changes: 13 additions & 9 deletions pkg/agent/controller/networkpolicy/networkpolicy_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -476,9 +476,9 @@ func (c *Controller) syncRule(key string) error {
klog.V(4).Infof("Finished syncing rule %q. (%v)", key, time.Since(startTime))
}()

rule, exists, completed := c.ruleCache.GetCompletedRule(key)
if !exists {
klog.V(2).Infof("Rule %v had been deleted, removing its flows", key)
rule, effective, realizable := c.ruleCache.GetCompletedRule(key)
if !effective {
klog.V(2).Infof("Rule %v was not effective, removing its flows", key)
if err := c.reconciler.Forget(key); err != nil {
return err
}
Expand All @@ -489,10 +489,10 @@ func (c *Controller) syncRule(key string) error {
}
return nil
}
// If the rule is not complete, we can simply skip it as it will be marked as dirty
// If the rule is not realizable, we can simply skip it as it will be marked as dirty
// and queued again when we receive the missing group it missed.
if !completed {
klog.V(2).Infof("Rule %v was not complete, skipping", key)
if !realizable {
klog.V(2).Infof("Rule %v was not realizable, skipping", key)
return nil
}
if err := c.reconciler.Reconcile(rule); err != nil {
Expand All @@ -515,9 +515,13 @@ func (c *Controller) syncRules(keys []string) error {

var allRules []*CompletedRule
for _, key := range keys {
rule, exists, completed := c.ruleCache.GetCompletedRule(key)
if !exists || !completed {
klog.Errorf("Rule %s is not complete or does not exist in cache", key)
rule, effective, realizable := c.ruleCache.GetCompletedRule(key)
// It's normal that a rule is not effective on this Node but abnormal that it is not realizable after watchers
// complete full sync.
if !effective {
klog.Infof("Rule %s is not effective on this Node", key)
} else if !realizable {
klog.Errorf("Rule %s is effective but not realizable", key)
} else {
allRules = append(allRules, rule)
}
Expand Down
Loading