Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for Antrea's OVS pipeline #206

Merged

Conversation

antoninbas
Copy link
Contributor

Some detailed documentation for the OVS pipeline, including a
description of each table. This is directed at developers and people
trying to troubleshoot issues.

It includes a SVG high-level diagram of the pipeline. We use SVG
directly so it renders better on all screens and to avoid having to
check-in a "large" PNG image that may need to be updated often.

More documentation specific to the Network Policy implementation will
follow later.

Fixes #27

@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests. This command can only be run by members of the vmware-tanzu organization
  • /skip-e2e: to skip e2e tests. This command can only be run by members of the vmware-tanzu organization


All traffic is finally resubmitted to the [DnatTable].

### DnatTable (40)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably: "DNATTable"

Copy link
Contributor

@abhiraut abhiraut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

* *table miss-miss flow entry*: a "catch-all" entry in a OpenFlow table, which
is used if no other flow is a match. If the table-miss flow entry does not
exist, by default packets unmatched by flow entries are dropped (discarded).
* *conjuctive match fields*: an efficient way in OVS to implement conjunctive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conjuctive -> conjunctive

```

After this table, ARP traffic is resubmitted to [ARPResponderTable], while IP
traffic is resubmitted to [ConnectionTrackingTable]. Traffic which does not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectionTrackingTable -> ConntrackTable

action, then the packets in the flow go through the switch in the same way
that they would if OpenFlow was not configured on the switch. Antrea uses this
action to process ARP traffic as a regular learning L2 switch would.
* *table miss-miss flow entry*: a "catch-all" entry in a OpenFlow table, which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean table-miss flow entry?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

This table complements [EgressRuleTable] for Network Policy egress rule
implementation. In K8s, when a Network Policy is applied to a set of Pods, the
default behavior for these Pods become "deny" (it becomes an [isolated
Pod](https://kubernetes.io/docs/concepts/services-networking/network-policies/#isolated-and-non-isolated-pods). This
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Pod](https://kubernetes.io/docs/concepts/services-networking/network-policies/#isolated-and-non-isolated-pods). This
Pod](https://kubernetes.io/docs/concepts/services-networking/network-policies/#isolated-and-non-isolated-pods)). This

Otherwise it appears as ... "deny" (it becomes an isolated Pod. This ...

can go through.

The rest of the flows read as follows: if the source IP address is in set
{10.10.1.2, 10.10.1.3}, and the destination port is in the set {3, 4} (which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "the destination OF port" to avoid confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


In the future this table may support an additional mode of operations, in which
it will implement kube-proxy functionality and take care of performing
laod-balancing / DNAT on traffic destined to services.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

laod -> load


If the `conjunction` action is matched, packets are "allowed" and resubmitted
directly to [L3ForwardingTable]. Other packets go to [EgressDefaultTable]. If a
connection is established - as a remainder all connections are committed in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remainder?

```
1. table=60, priority=200,ip,nw_src=10.10.1.2 actions=drop
2. table=60, priority=200,ip,nw_src=10.10.1.3 actions=drop
3. table=60, priority=80,ip actions=resubmit(,70)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the sake of completeness, as you have done throughout this document, also include one line about this resubmit rule..

can go through.

The rest of the flows read as follows: if the source IP address is in set
{10.10.1.2, 10.10.1.3}, and the destination port is in the set {3, 4} (which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

### IngressDefaultTable (100)

This table is similar in its purpose to [EgressDefaultTable], and it complements
[EgressRuleTable] for Network Policy egress rule implementation. In K8s, when a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be [IngressRuleTable]

### IngressDefaultTable (100)

This table is similar in its purpose to [EgressDefaultTable], and it complements
[EgressRuleTable] for Network Policy egress rule implementation. In K8s, when a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

egress rule impl.. -> ingress rule impl..

[EgressRuleTable] for Network Policy egress rule implementation. In K8s, when a
Network Policy is applied to a set of Pods, the default behavior for these Pods
become "deny" (it becomes an [isolated
Pod](https://kubernetes.io/docs/concepts/services-networking/network-policies/#isolated-and-non-isolated-pods). This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-isolated-pods). -> non-isolated-pods)).

Network Policy is applied to a set of Pods, the default behavior for these Pods
become "deny" (it becomes an [isolated
Pod](https://kubernetes.io/docs/concepts/services-networking/network-policies/#isolated-and-non-isolated-pods). This
table is in charge of dropping traffic originating from Pods to which a Network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be traffic destined to .. since it is ingress rule


## Tables

![OVS pipeline](/docs/assets/ovs-pipeline.svg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the diagram i noticed an arrow from table70 -> table60.. i believe it should be the other way around

@@ -0,0 +1,561 @@
# Antrea OVS Pipeline

## Terminology
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe somewhere in this doc add the ovs-ofctl command used to dump the flows?

This table handles all "tracked" packets (all packets are moved to the tracked
state by the previous table, [ConntrackTable]). It serves the following
purposes:
* keeps track of connections going through the gateway port; for all packets
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got this wrong: this mechanism applies to reverse traffic from a tunnel as well, not just from local backend Pods. @wenyingd could you confirm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConntrackStateTable also commits packets from the tunnel port into the ct_zone. It works by flow "table=31, priority=190,ct_state=+new+trk,ip actions=ct(commit,table=40,zone=65520)"

document](http://docs.openvswitch.org/en/latest/tutorials/ovs-conntrack/) for
more information on connection tracking in OVS.

### ConntrackStateTable (31)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be updated because of #213

docs/ovs-pipeline.md Show resolved Hide resolved
action, then the packets in the flow go through the switch in the same way
that they would if OpenFlow was not configured on the switch. Antrea uses this
action to process ARP traffic as a regular learning L2 switch would.
* *table miss-miss flow entry*: a "catch-all" entry in a OpenFlow table, which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

that they would if OpenFlow was not configured on the switch. Antrea uses this
action to process ARP traffic as a regular learning L2 switch would.
* *table miss-miss flow entry*: a "catch-all" entry in a OpenFlow table, which
is used if no other flow is a match. If the table-miss flow entry does not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is used if no other flow is matched"?

is "known", i.e. corresponds to an entry in [L2ForwardingCalcTable], which is
essentially a "dmac" table.
* reg1 (NXM_NX_REG1): it is used to store the egress OF port for the packet and
is set by [L2ForwardingCalcTable].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reg1 is also set in L3ForwardingTable for the packet that targets at Pods on peer Node (PR: #209), and in DNATTable for the packet targets at a Service IP(PR: #220)

2. table=0, priority=200,in_port=tun0 actions=load:0->NXM_NX_REG0[0..15],resubmit(,30)
3. table=0, priority=190,in_port="coredns5-8ec607" actions=load:0x2->NXM_NX_REG0[0..15],resubmit(,10)
4. table=0, priority=190,in_port="coredns5-9d9530" actions=load:0x2->NXM_NX_REG0[0..15],resubmit(,10)
5. table=0, priority=80,ip actions=resubmit(,10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This flow(no: 5) might be removed after PR: #199 is checked in, and the action of the table-miss should be drop.

If you dump the flows for this table, you should see something like this:
```
1. table=40, priority=200,ip,nw_dst=10.96.0.0/12 actions=output:gw0
2. table=40, priority=80,ip actions=resubmit(,50)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

priority=80 will be replaced by priority=0 in all tables once PR: #195 is merged.

If the `conjunction` action is matched, packets are "allowed" and resubmitted
directly to [L3ForwardingTable]. Other packets go to [EgressDefaultTable]. If a
connection is established - as a remainder all connections are committed in
[ConntrackStateTable] - its packets go straight to [L3ForwardingTable], with no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "Established" connections?

@antoninbas
Copy link
Contributor Author

It seems that all the patches to the OVS pipeline have been merged, so I'll update my PR to reflect the latest changes.

@antoninbas antoninbas force-pushed the add-documentation-for-OVS-pipeline branch from 5a1174b to 3a34ae6 Compare December 18, 2019 22:07
@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-all: to trigger all tests.
  • /skip-all: to skip all tests.

These commands can only be run by members of the vmware-tanzu organization.

@antoninbas antoninbas force-pushed the add-documentation-for-OVS-pipeline branch from 3a34ae6 to 46db62e Compare December 19, 2019 00:37
@antoninbas
Copy link
Contributor Author

I addressed review comments and updated the doc to reflect the latest OVS pipeline. PTAL.

@antoninbas
Copy link
Contributor Author

Any chance we can review / merge this?
Any change to the OVS pipeline recently?

@jianjuns
Copy link
Contributor

jianjuns commented Jan 7, 2020

Any chance we can review / merge this?
Any change to the OVS pipeline recently?

Sorry, I did not review yet. IPSec introduced some changes: the IPSec tunnel itself which we could document separately, and the change to load tunnel ofport in the L3 table and skip L2 and ingress policy tables. Check the commit description: 0d2e4d9

@wenyingd could comment on any other changes.

@antoninbas
Copy link
Contributor Author

@jianjuns For IPSec support, I am also leaning towards a separate document or a future PR. I will update this PR to indicate that some flows are different when IPSec is enabled and to take into account the new table bypass for tunnelled traffic.

@jianjuns
Copy link
Contributor

jianjuns commented Jan 7, 2020

@antoninbas: what I mean is the IPSec PR also changes the L3 flows for the normal tunnels. You might want to include that part into your doc.

@antoninbas antoninbas force-pushed the add-documentation-for-OVS-pipeline branch from 46db62e to bba519a Compare January 7, 2020 22:35
@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-all: to trigger all tests.
  • /skip-all: to skip all tests.

These commands can only be run by members of the vmware-tanzu organization.

@antoninbas
Copy link
Contributor Author

@jianjuns yes that's what I meant. I have updated the document to account for the changes for normal tunnels.

docs/ovs-pipeline.md Outdated Show resolved Hide resolved
docs/ovs-pipeline.md Outdated Show resolved Hide resolved
docs/ovs-pipeline.md Show resolved Hide resolved
docs/ovs-pipeline.md Outdated Show resolved Hide resolved
docs/ovs-pipeline.md Show resolved Hide resolved
@antoninbas antoninbas force-pushed the add-documentation-for-OVS-pipeline branch from bba519a to 1eb0978 Compare January 8, 2020 19:40
@antrea-bot
Copy link
Collaborator

Thanks for your PR.
Unit tests and code linters are run automatically every time the PR is updated.
E2e, conformance and network policy tests can only be triggered by a member of the vmware-tanzu organization. Regular contributors to the project should join the org.

The following commands are available:

  • /test-e2e: to trigger e2e tests.
  • /skip-e2e: to skip e2e tests.
  • /test-conformance: to trigger conformance tests.
  • /skip-conformance: to skip conformance tests.
  • /test-networkpolicy: to trigger networkpolicy tests.
  • /skip-networkpolicy: to skip networkpolicy tests.
  • /test-all: to trigger all tests.
  • /skip-all: to skip all tests.

These commands can only be run by members of the vmware-tanzu organization.

@antoninbas
Copy link
Contributor Author

Thanks for the review @wenyingd

Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, expect for one comment.


As for [EgressRuleTable], flow 1 (highest priority) ensures that for established
connections - as a remainder all connections are committed in
[ConntrackStateTable] - packets go straight to [L2ForwardingOutTable], with no
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all connections are committed in [ConntrackCommitTable]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for catching this.

Some detailed documentation for the OVS pipeline, including a
description of each table. This is directed at developers and people
trying to troubleshoot issues.

It includes a SVG high-level diagram of the pipeline. We use SVG
directly so it renders better on all screens and to avoid having to
check-in a "large" PNG image that may need to be updated often.

More documentation specific to the Network Policy implementation will
follow later.

Fixes antrea-io#27
@antoninbas antoninbas force-pushed the add-documentation-for-OVS-pipeline branch from 1eb0978 to b0484c4 Compare January 9, 2020 18:33
Copy link
Contributor

@wenyingd wenyingd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@antoninbas
Copy link
Contributor Author

/skip-all

@antoninbas antoninbas merged commit bff28db into antrea-io:master Jan 10, 2020
@antoninbas antoninbas deleted the add-documentation-for-OVS-pipeline branch January 10, 2020 02:06
zyiou added a commit to zyiou/antrea that referenced this pull request Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document OVS Datapath Flows
7 participants