Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WiP) A new dev-focused documentation to provide a better overview of how it works. #526

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,13 @@ html
!*.h
example-*
dpdk_symbols_autogen.h
tests/**/*.gcda
tests/**/*.gcno
src/**/*.gcda
src/**/*.gcno
src/npf
libpacketgraph-dev.so.17.5.0
libpacketgraph.so.17.5.0
packetgraph_coverage/
coverage.xml

8 changes: 4 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -92,14 +92,14 @@ $(PG_OBJECTS) : src/%.o : src/%.c
$(PG_dev_OBJECTS): src/%-dev.o : src/%.c
$(CC) -c $(PG_dev_CFLAGS) $(PG_HEADERS) $< -o $@

doxygen.conf: $(srcdir)/doc/doxygen.conf.template
doxygen.conf: $(srcdir)/doxygen_build/doxygen.conf.template
$(shell sed "s|PG_SRC_PATH|$(srcdir)|g" $< > $@)

doc: doxygen.conf
$(srcdir)/doc/sed_readme.sh
$(srcdir)/doxygen_build/sed_readme.sh
doxygen $^
$(srcdir)/doc/check_error.sh
$(srcdir)/doc/deploy_documentation.sh
$(srcdir)/doxygen_build/check_error.sh
$(srcdir)/doxygen_build/deploy_documentation.sh

style:
$(srcdir)/tests/style/test.sh $(srcdir)
Expand Down
115 changes: 115 additions & 0 deletions doc/BRICK_CONCEPT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Packetgraph's brick concept.

## Overview.

Each brick have 2 sides: East and West (except for monopole brick that have only one).<br>
Each side can have 0 or more edges (except for dipole brick that have one edge per side).<br>
Edges are stored in brick's sides and pointing to another brick.<br>
So to create a link we need 2 edges... One from the first one pointing to the second one and vice versa.<br>
Note: the side notion is for the packet's source, because it goes directly to a brick.<br>
<br>
A basic dipole brick shema:<br>
```
| |
edge 0 <---| +---------+ |--->edge 0
edge ... <---|-| BRICK |-|--->edge ...
edge n <---| +---------+ |--->edge n
| |
| |
| |
| |
WEST SIDE EAST SIDE
```
And now 2 basic bricks linked together:<br>
```
+-----B-West to A----+ +---A-East to B-------+
| | | |
| V | | | | V |
edge 0 <---| +---------+ |--->edge 0-------+ edge 0 <---| +---------+ |--->edge 0
edge ... <---|-| BRICK A |-|--->edge ... | edge ...<---|-| BRICK B |-|--->edge ...
edge n <---| +---------+ |--->edge n +------edge n <---| +---------+ |--->edge n
| | | |
| | | |
| | | |
| | | |
WEST SIDE EAST SIDE WEST SIDE EAST SIDE
```
<br>
Why having sides?<br>
Because it makes it easier to perform operations between two sides such as acting as a diode, filter...<br>

### Warning!

While creating links, make sure that there is not 2 bricks modifying packets (VXLAN, VTEP) on the same side!<br>
Here is why:<br>
To improve our perfs, we do not copy packets so if a brick modify them, they will be modified for all other bricks on this side.<br>
<br>
Example:<br>
We want to link some VMs to a VTEP. So we need VHOST bricks for each VMs and a switch.<br>
The VTEP must NOT be on the same side than VHOST bricks.<br>
Sides are decided by the order of the arguments of the method `pg_brick_link(BRICK_A, BRICK_B)`.<br>
So basically we will do so:

* `pg_brick_link(SWITCH, VTEP);`
* `pg_brick_link(VHOST_0, SWITCH);`
* `pg_brick_link(VHOST_1, SWITCH);`
* `pg_brick_link(VHOST_2, SWITCH);`

So we are sure that VTEP and VHOST_n are not on the same side.<br>
If we cannot isolate as we want the VTEP, a NOT recommended way would be to disable the `NOCOPY` flag.

## How monopole/single edge brick works:
### Single edge:
As the following content shows it, `edge` and `edges` are in an `union` so basically one side can have `edge` OR `edges`.
```
struct pg_brick_side {
[...]
/* Edges is use by multipoles bricks,
* and edge by dipole and monopole bricks
*/
union {
struct pg_brick_edge *edges; /* edges */
struct pg_brick_edge edge;
};
};
```
### Single side:
As the following content shows it, `side` and `sides` are in an `union` so basically one side can have `side` OR `sides`.
```
struct pg_brick {
[...]
union {
struct pg_brick_side sides[PG_MAX_SIDE];
struct pg_brick_side side;
};
};
```

## Brick's common packet operations.

Packets are going through bricks via bursts. Bursts are started only from Inputs/Outputs of the graph (IO bricks: VHOST, NIC, RXTX, VTEP) during a graph poll. so polling bricks that are not IO bricks is a nonsense.<br>

```
struct pg_brick {
[...]
/* Accept a packet burst */
int (*burst)(struct pg_brick *brick, enum pg_side from,
uint16_t edge_index, struct rte_mbuf **pkts,
uint64_t pkts_mask, struct pg_error **errp);
/* polling */
int (*poll)(struct pg_brick *brick,
uint16_t *count, struct pg_error **errp);
[...]
};
```

According to the `pg_brick` strcture, we have two methods dealing with packets:

* `int burst([...]);`<br>
Called to burst packets through an edge through a side to another brick though her other side.<br>
Example: If I burst from brick A through East side and edge 1 (pointing to brick B),<br> brick B will receive it though West side.<br>
Bursts are not moving packets! Bursts are passing their adress in the hugepage to another brick's burst (Each burst call the next's brick burst until going outside the graph via an IO brick).
* `int poll([...]);`<br>
Called during a graph poll for an input brick. Most of the time it will burst packets received/created through the graph.


30 changes: 30 additions & 0 deletions doc/PG_GENERAL_CONCEPT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Packetgraph's general concept
## Introduction
Outscale's packetgraph is a solution to link (In a network) virtual machines with some others and/or the real world.<br>
It aims at doing it fast, and for this purpose we will use [DPDK](https://www.dpdk.org/) aka Data Plane Development Kit.<br>
The core idea is to do not "move" packets which cost a lot of memmory and time so we alocate them one time for all.<br>
.<br>


```

The outer +---Host Machine---------------------------------------------------------------------+
World | |
| +--The GRAPH--------------------------------------+ |
| | | |
| | +-------+ +---------+ |
| | +---------------------------->| VHOST |<------------>| VM | |
| | | +-------+ +---------+ |
| | v | |
+---------+ +---------+ +-------+ +---------+ |
<-------->| NIC |<------>|Switch |<---------------------->| VHOST |<------------>| VM | |
+---------+ +---------+ +-------+ +---------+ |
| | ^ | |
| | | +--------------+ +-------+ +---------+ |
| | +----->| Firewall |<----->| VHOST |<------------>| VM | |
| | +--------------+ +-------+ +---------+ |
| | | |
| +-------------------------------------------------- |
| |
+------------------------------------------------------------------------------------+
```
20 changes: 20 additions & 0 deletions doc/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# DOCUMENTATION

Here is a documentation aiming at providing detailed information about Packetgraph's brick concept, about implemented technologies/features (with standards descriptions) and about each brick. The idea is to explain what's the purpose of each component, further optimizations and choices made.<br>
All this documentation must be written in ASCII so we can access it through a terminal.<br>
<br>
An overview of the general concept of packetgraph:
* [General concept.](PG_GENERAL_CONCEPT.md)

Detailed brick linking information and shema are availables here:
* [Packetgraph's brick concept.](BRICK_CONCEPT.md)

For specific brick'informations and shemas:
* [VHOST brick.](VHOST.md)
* [RXTX brick.](RXTX.md)
* [VTEP brick.](VTEP.md)
* [SWITCH brick.](SWITCH.md)

About out testing architecture:
* `wip`

21 changes: 21 additions & 0 deletions doc/RXTX.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# RXTX Brick

## Introduction

he RXTX brick is a monopole single-edge brick.<br>
It is intended mainly for testing and benchmarking purpose.<br>
We use it to create packet and send them through a graph and/or receive them.<br>

## Usage

Here is the constructor:
```
struct pg_brick *pg_rxtx_new(const char *name,
pg_rxtx_rx_callback_t rx,
pg_rxtx_tx_callback_t tx,
void *private_data)
```
As we can see, we give it two callbacks as parameters:

* `pg_rxtx_rx_callback_t rx`: The method that will be used to send packets.
* `pg_rxtx_rx_callback_t tx`: The method that will be called whene the brick receive packets.
60 changes: 60 additions & 0 deletions doc/SWITCH.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# SWITCH Brick

## Introduction

The switch brick is a brick doing what does a switch do in "real life" as described [here](https://en.wikipedia.org/wiki/Network_switch).<br>
The core feature being that when we receive an incoming packet, trying to reach a mac address, there's two cases:

* We know on which interface we can find it and so we forward it directly through the right interface..

* We don't know where is the packet's destination so we broadcast it on all interfaces. Then, once we get the answer, we update the MAC TABLE linking MAC address with INTERFACES. So the next time we will know where to forward the packet.

Another thing is the forget feature: if a line in the mac table hasn't been used since a long time, we forget it!<br>
However it is not working yet!

## Basic example: connect 3 VMs to a NIC.

```

The outer +---Host Machine---------------------------------------------------------------------+
World | |
| +--The GRAPH--------------------------------------+ |
| | | |
| | +-------+ +---------+ |
| | | |<------------------>| VHOST |<------------>| VM | |
| | | | +-------+ +---------+ |
| | | | | |
+---------+ | +---------+ | +-------+ +---------+ |
<-------->| NIC |<-->|---|Switch |---|<------------------>| VHOST |<------------>| VM | |
+---------+ | +---------+ | +-------+ +---------+ |
| | | | | |
| | | | +-------+ +---------+ |
| | | |<------------------>| VHOST |<------------>| VM | |
| | WEST SIDE EAST SIDE +-------+ +---------+ |
| | | |
| +-------------------------------------------------- |
| |
+------------------------------------------------------------------------------------+
```

Note: it's always a good practice to link to one side all "subnet" devices and to another the "upper" device (No matter if it's EAST or WEST!).<br>
Here is the reason:<br>
Basically, if we heve a brick modifying packets such as VTEP or VXLAN, we should isolate it on a side. In fact, to manage packets faster, we do not copy them so be careful to do not modify them! They would be modified for all bricks on this side.<br>
Please refer to the [warning section of the brick concept's overview](BRICK_CONCEPT.md) for more informations.

## Let's go deeper into the MAC TABLE.

from `src/utils/mac-table.h`:
```
A mac array containing pointers or elements
the idea of this mac table, is that a mac is an unique identifier,
as sure, doesn't need hashing we could just
allocate an array for each possible mac
Problem is that doing so require ~280 TByte
So I've cut the mac in 2 part, example 01.02.03.04.05.06
will now have "01.02.03" that will serve as index of the mac table
and "04.05.06" will serve as the index of the sub mac table
if order to take advantage of Virtual Memory, we use bitmask, so we
don't have to allocate 512 MB of physical ram for each unlucky mac.

```
48 changes: 48 additions & 0 deletions doc/VHOST.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# VHOST Brick

## Introduction

The VHOST brick is the brick used to make the graph communicate with VMs.<br>
The problem while communicating with VMs via "standard way" is that it's really slow.<br>
So here we use the virtio protocol implemented as vhost in DPDK.<br>
You can find a more detailed description here: https://www.redhat.com/en/blog/hands-vhost-user-warm-welcome-dpdk.<br>


## VHOST overview
```
+---------------------------+
| |
| +-------------+ | +-------------+
| | Graph's side| | | Host's side |
| +-------------+ | +-------------+
| |
| | |
| | +--------+ +-------------+ +---------------+
| edge <---|-| VHOST |<------------>| UNIX SOCKET |<----------->|Virtual-Machine|
| | +--------+ +-------------+ +---------------+
| | | ^ ^
| Side | | |
| ^ ^ ^ ^ ^ ^ ^ ^ | | |
+-|-|-|-|-|-|-|-|-----------+ | |
+-|-|-|-|-|-|-|-|--------------|-------------------------------------------------------|------+
| v v v v v v v v v v |
| Host's hugepage which is a shared memmory, containing packets. |
| |
+---------------------------------------------------------------------------------------------+
```
As previously described, VHOST use an unix socket and a hugepage to communicate via ip.<br>
It manages a queue and reduce memmory write/free operatons.<br>
It's based on a cient(s)/server model, meaning that one server can handle multiple connections through the socket.<br> Only packet address in the hugepage are flowing through the socket.<br>

## How to use it

* `pg_vhost_start("/tmp", &error)`: start the vhost driver and setup the socket's folder.
* `pg_vhost_new("vhost-0", flags, &error);`: create the brick.<br>The socket will be named `qemu-vhost-0`.<br>Here are some flags availables:
* `PG_VHOST_USER_CLIENT`: means that the brick will be the client and the qemu the server.
* `PG_VHOST_USER_DEQUEUE_ZERO_COPY`: means that we will use zero copy. #FIXME: explain more.
* `PG_VHOST_USER_NO_RECONNECT`: disable reconnection after disconnection.

## Current VHOST brick's status

Currently the VHOST brick only works in SERVER mode... Which means that if packetgraph crash, we will need to reboot VMs...<br>Not a good thing!<br>
However, a PR in in progress to adress this issue.<br>
42 changes: 42 additions & 0 deletions doc/VTEP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# VTEP Brick

## Introduction

The VTEP brick is intended to allow us to use the VXLAN protocol.<br>
The VXLAN protocol is allowing us to create tunnels between multpiple LAN through another network.<br>

## VXLAN protocol

THe VXLAN (Virtual Extensible LAN) protocol work by encapsulating packets before sending them through another network.<br>
It's all in the 2 OSI layer.

## Usage

Our typical use case is making tunnels between virtual networks (or packetgraph's graphs) through the host's network via a NIC brick.<br>
When creating the brick, we tell her which side is the output (the tunnel's network), either East or West.<br>
This brick has an empty poll function because the only way to make packets going through ot is by bursting from an input/output of the graph.<br>

Example use case: making a tunnel between virtual networks 1 and 2 through the host's network.<br>
(NIC brick are described in the [NIC section](NIC.md))<br>
For a better link description between bricks, see [packetgraph's brick concept.](BRICK_CONCEPT.md)<br>
```
Virtual Network 1 | Host's network | Virtual Network 2
| |
| |
| | | | | |
Virtual Network 1 <---| +--------+ | +---------+ +---------+ | +--------+ |---> Virtual Network 1
Virtual Network ... <---|-| VTEP |-|------>| NIC |----------| NIC |<------|-| VTEP |-|---> Virtual Network ...
Virtual Network n <---| +--------+ | +---------+ +---------+ | +--------+ |---> Virtual Network n
| | | | | |
West Side East Side | | West Side East Side
| |
| |


```
Using the previous shema, it will be like if VN1 and VN2 were the same network.<br>

# Output redirection
The VTEP brick features her own switch. It is based on the VNI of each packet (Virtual Network Identifier).<br>
One vtep can lead to multiple virtual networks, each one identified by its own VNI.<br>
On a virtual network, there could be any kind of packetgraph brick, and why not another VTEP? If we want to encapsulate one more time the encapsulated network...<br>
Loading