Skip to content

Commit

Permalink
Merge branch 'Azure:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
mmiele authored Jun 17, 2022
2 parents e4b3b64 + 49d9f2c commit 48ae39c
Show file tree
Hide file tree
Showing 11 changed files with 155 additions and 4 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,5 +50,5 @@ Any use of third-party trademarks or logos are subject to those third-party's po

<!-- dash icon -->
<div align="center">
<img src="documentation/images/icons/dash-icon-xlarge.png" style="align:center;"/>
<img src="documentation/images/icons/dash-icon-medium.svg" style="align:center;"/>
<div/>
6 changes: 3 additions & 3 deletions documentation/general/design/dash-sonic-hld.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ DASH_ACL_OUT:{{eni}}:{{stage}}
```

```
key = DASH_ACL_IN:eni:stage ; ENI MAC and state as key; ACL stage can be {1, 2, 3 ..}
key = DASH_ACL_IN:eni:stage ; ENI MAC and stage as key; ACL stage can be {1, 2, 3 ..}
; field = value
acl_group_id = ACL group ID
```
Expand Down Expand Up @@ -516,7 +516,7 @@ DASH_VNET:Vnet1: {
DASH_ENI:F4939FEFC47E : {
"eni_id": "497f23d7-f0ac-4c99-a98f-59b470e8c7bd",
"mac_address": "F4939FEFC47E",
"pa_addr": 25.1.1.1,
"underlay_ip": 25.1.1.1,
"admin_state": "enabled",
"vnet": "Vnet1"
}
Expand Down Expand Up @@ -604,7 +604,7 @@ For the example configuration above, the following is a brief explanation of loo
d. Mapping table for 10.1.1.1 shall be hit and it takes the action "vnet_encap".
e. Encap action shall be performed and use PA address as specified by "underlay_ip"
2. Packet destined to 10.1.0.1:
a. LPM lookup hits for entry 10.1.0.24/24
a. LPM lookup hits for entry 10.1.0.0/24
b. The action in this case is "vnet" and the routing type for "vnet" is "maprouting", with overlay_ip specified
c. Next lookup shall happen on the "mapping" table for Vnet "Vnet1", but for overlay_ip 10.0.0.6
d. Mapping table for 10.0.0.6 shall be hit and it takes the action "vnet_encap".
Expand Down
1 change: 1 addition & 0 deletions documentation/high-avail/design/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,4 @@ This folder contains DASH High Avalability and Scale design and architecture doc
| ------------------------------------------------------ | ------------------------------------------ |
| [high-availability-and-scale.md](high-availability-and-scale.md) | DASH High-Availability and Scale design document |
| [xsight-labs-ha-proposal-v1.md](xsight-labs-ha-proposal-v1.md) | Initial HA proposal document |
| [xsight-labs-ha-proposal-new-ideas.md](xsight-labs-ha-proposal-new-ideas.md)|Addendum to the initial HA proposal (preview)|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions documentation/high-avail/design/images/ha-sync-operations.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
126 changes: 126 additions & 0 deletions documentation/high-avail/design/xsight-labs-ha-proposal-new-ideas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# DASH High Availability (HA) proposal preview

By John Carney, Xsight Labs

There has been disagreement among members of the DASH community about the HA
requirements, tradeoffs, and proposed protocols/approaches. Each vendor has
unique architectures and constraints. There are many tradeoffs, including the
fidelity of HA/fault tolerance as well as the bandwidth/processing costs.

##  Desirable HA properties and goals

The following are desirable HA properties/goals (there are probably more). Due
to differences in architectures, constraints, and deployment use cases, I
propose that these remain qualitative not be quantified as strict requirements
of DASH. A DASH buyer may quantify any of these properties as strict
requirements for their particular deployment.

- Minimize or eliminate the possibility of established connections breaking
after a failover. If the endpoints of a connection have gotten into the
established state prior to a failover, then the connection should not be black
holed after a failover.

- Minimize the time to remove closed connections to avoid filling the connection
table with zombie connections. If a connection is closed/removed on one DPU,
then the connection should be quickly removed on the peer DPU. Zombie
connections will eventually age out. There should be some tolerance for a small
or bounded number of zombie connections in the connection table, especially
after a failover.

- Minimize the necessity for the endpoints to retransmit packets in order to
"replay" packets that cause state changes and are dropped due to HA transport or
processing constraints.

- Minimize link bandwidth and DPU processing overhead for HA state
synchronization.

Some of the above may represent conflicting goals for a particular HA approach.
For example, one HA approach may be able to minimize/eliminate the possibility
of breaking established connections by consuming more bandwidth for HA. Such
tradeoffs are appropriate in different use cases.

We are now working on a proposal for an HA protocol definition that will provide
HA interoperability while also enabling flexibility for each vendor to achieve
the above properties, given their own architecture, constraints and chosen
tradeoffs. Each vendor can individually quantify and be tested on the merits of
the HA properties described above. DASH should neither resort to a "least common
denominator" approach nor force complexity and HA modes that are too costly or
unimplementable for some vendors. The buyer of a DASH solution can test the
vendor's compliance with the defined HA protocol and decide if the vendor's
tradeoffs and adherence to the desired HA properties meets the requirements for
their use case.

We can publish the proposal with much more detail at a later date; in the
meantime, a **preview is shown below**.

## Proposal preview

For state synchronization there are state **sender** and a state **receiver**
roles. Each DPU implements both roles. There are two types of HA messages:
"state update" and "packet update". These will be defined in more detail in the
proposal.

The receiver must be able to parse and process both types of messages. The
sender may choose to coalesce multiple synchronization messages into a single
state synchronization packet, however the receiver will advertise the maximum
number of coalesced messages supported. The sender must honor this. A receiver
can specify this to be 1. The receiver's processing of HA packets/messages is
defined to be **simple and stateless**.

The sender of HA state synchronization updates has the full flexibility to be
stateless or stateful. The sender will specify with each state synchronization
packet whether a reply (completion) is requested and a hint of whether the reply
may be truncated. The reply is simply the original HA state synchronization
packet with a reply flag set and is possibly truncated (it is allowed, but not
bandwidth optimized, for the receiver to not truncate the reply when requested).
The sender may optionally include opaque information with each individual
message in the synchronization packet and/or for the synchronization packet at a
whole. When the reply is returned to the sender, the opaque information can be
used in an implementation specific manner to accomplish stateless or stateful
synchronization operations.

![ha-sync-operations](images/ha-sync-operations.svg)

Here are some examples of different HA approaches that are possible with this
simple protocol. A vendor may select among these (or other possible) approaches.
A vendor may limit their HA implementation to only the approach(es) that are
possible, feasible, or best for their architecture. The definition of the
receiver behavior is simple and remains independent of, but interoperable with,
any sender approach.

1. The sender may send packets that causes state updates to the receiver and
have it returned back for transmission to the endpoint. Drops due to
transport unreliability or exceeding DPU processing limits are retransmitted
by the endpoints.
2. The sender may send a state update message with each state change event to
the receiver without requesting replies. The sender may periodically resend
the entire connection state without requesting replies.
3. With each packet that causes a state change, the sender may buffer/hold the
packet and then send a state change message with an opaque value to the
receiver, requesting a reply. When the opaque value is returned with the
reply, it is associated with the held packet that is then transmitted to the
endpoint. If the reply is not returned in a timely manner, the sender may
drop the held packet and free the buffer, relying on the endpoints to
retransmit dropped packets. Alternatively, the sender may choose to resend
the synchronization message to the receiver, effectively creating a reliable
transport without imposing any impact on the endpoints.

In addition to the above, there are many other possible tradeoffs that a sender
may make. The sender may choose to not send certain state change events for a
connection. This may save bandwidth at the expense of some fault tolerance. The
sender may only choose to buffer/hold certain packets and not others. For
example, there may be less value in buffering/holding syn packets. If the peer
does not learn of the syn and there is a failover. The syn-ack will be dropped
by the peer and the syn will be retransmitted by the endpoint. Anyone may
contribute a full definition and analysis any of the sender approaches above, or
other possible approaches/optimizations, as an optional "standardized" DASH HA
sender modes. These modes only affect the sender behavior. All sender modes use
the same protocol definition and the receiver behavior will always remain
simple, stateless and interoperable with all sender modes.

![ha-state-sync-packet-format](images/ha-state-sync-packet-format.svg)

We can produce the proposal with more detail, including the full packet/message
formats. We are happy to get early feedback and discussion before formalizing
the proposal. I wanted to get this out to you so that you can think about it,
and possibly respond, before the next meeting.
Binary file modified documentation/high-avail/slides/DASH High Availability.pptx
Binary file not shown.
4 changes: 4 additions & 0 deletions documentation/images/icons/dash-icon-large.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 48ae39c

Please sign in to comment.