Merge branch 'Azure:main' into main

reshmaintel · Jun 17, 2022 · 48ae39c · 48ae39c
2 parents e4b3b64 + 49d9f2c
commit 48ae39c
Show file tree

Hide file tree

Showing 11 changed files with 155 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -50,5 +50,5 @@ Any use of third-party trademarks or logos are subject to those third-party's po
 
 <!-- dash icon -->
 <div align="center">
-<img src="documentation/images/icons/dash-icon-xlarge.png" style="align:center;"/>
+<img src="documentation/images/icons/dash-icon-medium.svg" style="align:center;"/>
 <div/> 
diff --git a/documentation/general/design/dash-sonic-hld.md b/documentation/general/design/dash-sonic-hld.md
@@ -219,7 +219,7 @@ DASH_ACL_OUT:{{eni}}:{{stage}}
 ```
 
 ```
-key                      = DASH_ACL_IN:eni:stage ; ENI MAC and state as key; ACL stage can be {1, 2, 3 ..}
+key                      = DASH_ACL_IN:eni:stage ; ENI MAC and stage as key; ACL stage can be {1, 2, 3 ..}
 ; field                  = value 
 acl_group_id             = ACL group ID
 ```
@@ -516,7 +516,7 @@ DASH_VNET:Vnet1: {
 DASH_ENI:F4939FEFC47E : { 
     "eni_id": "497f23d7-f0ac-4c99-a98f-59b470e8c7bd",
     "mac_address": "F4939FEFC47E",
-    "pa_addr": 25.1.1.1,
+    "underlay_ip": 25.1.1.1,
     "admin_state": "enabled",
     "vnet": "Vnet1"
 }
@@ -604,7 +604,7 @@ For the example configuration above, the following is a brief explanation of loo
 		d. Mapping table for 10.1.1.1 shall be hit and it takes the action "vnet_encap". 
 		e. Encap action shall be performed and use PA address as specified by "underlay_ip"
 	2. Packet destined to 10.1.0.1:
-		a. LPM lookup hits for entry 10.1.0.24/24
+		a. LPM lookup hits for entry 10.1.0.0/24
 		b. The action in this case is "vnet" and the routing type for "vnet" is "maprouting", with overlay_ip specified
 		c. Next lookup shall happen on the "mapping" table for Vnet "Vnet1", but for overlay_ip 10.0.0.6
 		d. Mapping table for 10.0.0.6 shall be hit and it takes the action "vnet_encap". 

diff --git a/documentation/high-avail/design/README.md b/documentation/high-avail/design/README.md
@@ -12,3 +12,4 @@ This folder contains DASH High Avalability and Scale design and architecture doc
 | ------------------------------------------------------ | ------------------------------------------ |
 | [high-availability-and-scale.md](high-availability-and-scale.md) | DASH High-Availability and Scale design document   |
 | [xsight-labs-ha-proposal-v1.md](xsight-labs-ha-proposal-v1.md) | Initial HA proposal document   |
+| [xsight-labs-ha-proposal-new-ideas.md](xsight-labs-ha-proposal-new-ideas.md)|Addendum to the initial HA proposal (preview)|
diff --git a/documentation/high-avail/design/images/ha-state-sync-packet-format.svg b/documentation/high-avail/design/images/ha-state-sync-packet-format.svg
diff --git a/documentation/high-avail/design/images/ha-sync-operations.svg b/documentation/high-avail/design/images/ha-sync-operations.svg
diff --git a/documentation/high-avail/design/xsight-labs-ha-proposal-new-ideas.md b/documentation/high-avail/design/xsight-labs-ha-proposal-new-ideas.md
@@ -0,0 +1,126 @@
+# DASH High Availability (HA) proposal preview
+
+By John Carney, Xsight Labs
+
+There has been disagreement among members of the DASH community about the HA
+requirements, tradeoffs, and proposed protocols/approaches. Each vendor has
+unique architectures and constraints. There are many tradeoffs, including the
+fidelity of HA/fault tolerance as well as the bandwidth/processing costs.
+
+##  Desirable HA properties and goals
+
+The following are desirable HA properties/goals (there are probably more). Due
+to differences in architectures, constraints, and deployment use cases, I
+propose that these remain qualitative not be quantified as strict requirements
+of DASH. A DASH buyer may quantify any of these properties as strict
+requirements for their particular deployment.
+
+- Minimize or eliminate the possibility of established connections breaking
+after a failover. If the endpoints of a connection have gotten into the
+established state prior to a failover, then the connection should not be black
+holed after a failover.
+
+- Minimize the time to remove closed connections to avoid filling the connection
+table with zombie connections. If a connection is closed/removed on one DPU,
+then the connection should be quickly removed on the peer DPU. Zombie
+connections will eventually age out. There should be some tolerance for a small
+or bounded number of zombie connections in the connection table, especially
+after a failover.
+
+- Minimize the necessity for the endpoints to retransmit packets in order to
+"replay" packets that cause state changes and are dropped due to HA transport or
+processing constraints.
+
+- Minimize link bandwidth and DPU processing overhead for HA state
+synchronization.
+
+Some of the above may represent conflicting goals for a particular HA approach.
+For example, one HA approach may be able to minimize/eliminate the possibility
+of breaking established connections by consuming more bandwidth for HA. Such
+tradeoffs are appropriate in different use cases.
+
+We are now working on a proposal for an HA protocol definition that will provide
+HA interoperability while also enabling flexibility for each vendor to achieve
+the above properties, given their own architecture, constraints and chosen
+tradeoffs. Each vendor can individually quantify and be tested on the merits of
+the HA properties described above. DASH should neither resort to a "least common
+denominator" approach nor force complexity and HA modes that are too costly or
+unimplementable for some vendors. The buyer of a DASH solution can test the
+vendor's compliance with the defined HA protocol and decide if the vendor's
+tradeoffs and adherence to the desired HA properties meets the requirements for
+their use case.
+
+We can publish the proposal with much more detail at a later date; in the
+meantime, a **preview is shown below**.
+
+## Proposal preview
+
+For state synchronization there are state **sender** and a state **receiver**
+roles. Each DPU implements both roles. There are two types of HA messages:
+"state update" and "packet update". These will be defined in more detail in the
+proposal.
+
+The receiver must be able to parse and process both types of messages. The
+sender may choose to coalesce multiple synchronization messages into a single
+state synchronization packet, however the receiver will advertise the maximum
+number of coalesced messages supported. The sender must honor this. A receiver
+can specify this to be 1. The receiver's processing of HA packets/messages is
+defined to be **simple and stateless**.
+
+The sender of HA state synchronization updates has the full flexibility to be
+stateless or stateful. The sender will specify with each state synchronization
+packet whether a reply (completion) is requested and a hint of whether the reply
+may be truncated. The reply is simply the original HA state synchronization
+packet with a reply flag set and is possibly truncated (it is allowed, but not
+bandwidth optimized, for the receiver to not truncate the reply when requested).
+The sender may optionally include opaque information with each individual
+message in the synchronization packet and/or for the synchronization packet at a
+whole. When the reply is returned to the sender, the opaque information can be
+used in an implementation specific manner to accomplish stateless or stateful
+synchronization operations.
+
+![ha-sync-operations](images/ha-sync-operations.svg)
+
+Here are some examples of different HA approaches that are possible with this
+simple protocol. A vendor may select among these (or other possible) approaches.
+A vendor may limit their HA implementation to only the approach(es) that are
+possible, feasible, or best for their architecture. The definition of the
+receiver behavior is simple and remains independent of, but interoperable with,
+any sender approach.
+
+1. The sender may send packets that causes state updates to the receiver and
+    have it returned back for transmission to the endpoint. Drops due to
+    transport unreliability or exceeding DPU processing limits are retransmitted
+    by the endpoints.
+2. The sender may send a state update message with each state change event to
+    the receiver without requesting replies. The sender may periodically resend
+    the entire connection state without requesting replies.
+3. With each packet that causes a state change, the sender may buffer/hold the
+    packet and then send a state change message with an opaque value to the
+    receiver, requesting a reply. When the opaque value is returned with the
+    reply, it is associated with the held packet that is then transmitted to the
+    endpoint. If the reply is not returned in a timely manner, the sender may
+    drop the held packet and free the buffer, relying on the endpoints to
+    retransmit dropped packets. Alternatively, the sender may choose to resend
+    the synchronization message to the receiver, effectively creating a reliable
+    transport without imposing any impact on the endpoints.
+
+In addition to the above, there are many other possible tradeoffs that a sender
+may make. The sender may choose to not send certain state change events for a
+connection. This may save bandwidth at the expense of some fault tolerance. The
+sender may only choose to buffer/hold certain packets and not others. For
+example, there may be less value in buffering/holding syn packets. If the peer
+does not learn of the syn and there is a failover. The syn-ack will be dropped
+by the peer and the syn will be retransmitted by the endpoint. Anyone may
+contribute a full definition and analysis any of the sender approaches above, or
+other possible approaches/optimizations, as an optional "standardized" DASH HA
+sender modes. These modes only affect the sender behavior. All sender modes use
+the same protocol definition and the receiver behavior will always remain
+simple, stateless and interoperable with all sender modes.
+
+![ha-state-sync-packet-format](images/ha-state-sync-packet-format.svg)
+
+We can produce the proposal with more detail, including the full packet/message
+formats. We are happy to get early feedback and discussion before formalizing
+the proposal. I wanted to get this out to you so that you can think about it,
+and possibly respond, before the next meeting.
diff --git a/documentation/high-avail/slides/DASH High Availability.pptx b/documentation/high-avail/slides/DASH High Availability.pptx
diff --git a/documentation/images/icons/dash-icon-large.svg b/documentation/images/icons/dash-icon-large.svg