-
Notifications
You must be signed in to change notification settings - Fork 656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the telemetry-atomic
extension to BGP policy statements
#867
Conversation
Add the telemetry-atomic extension to the BGP policy statements container to reflect that policies are configured as a whole rather than as individual statements. This change also enables scalar-based telemetry to support the ordering of this 'ordered-by user' list.
Major YANG version changes in commit d32f827: |
Compatibility Report for commit d32f827: |
Hi,
Regards, |
|
Currently thinking about placing @rgwilton @jhaas-pfrc wondering if you have any initial thoughts on this, I'm just brainstorming here. |
I'm pretty doubtful about an "atomic" extension here for config. That implies some expected server behavior and I'm not sure we want to introduce that type of concept. Maybe we should describe this in the "description" statement for the list though (i.e. that it is like an atomic 'program' and can't be updated piecemeal, i.e. any sort of "eventual consistency" approach isn't applicable to the entire list). I'd advocate for addressing this at the gNMI protocol level. i.e. adding "insert after" type fields to gNMI (in Set, Get and Notifications). |
I think of this concept as being separable into a config requirement and a telemetry requirement. Focusing on the telemetry requirement, would you say there is a difference between config and state data when you're querying for telemetry? Would this concept impose more difficulties compared to the Addressing this at the gNMI protocol level might be a more disruptive change. Even assuming it is not, then we must design a solution that allows for any client to answer the questions,
In order to know 2 without an atomic flag, we need to add enough information to gNMI to tell the user that all leaf updates have been received. I believe the current preferred solution is to use atomic, as it doesn't run into a scale issue when it's only at the list element level, and furthermore there are no changes required for gNMI. |
So one clarification here is that public/release/models/policy/openconfig-routing-policy.yang Lines 1169 to 1183 in f2a7536
Note that |
@jhaas-pfrc @rgwilton @jsterne @hellt @rolandphung @earies I've summarized the problem for easier consumption, but essentially if you understand the PR then just need to read the Request for Comments and "Important Note" sections below: Request for CommentsWhat is the estimated worst-case size of a single BGP policy in terms of the serialized gNMI IntroductionCurrent OpenConfig models contain two instances of ordered-by user lists (or “ordered lists” for short):
Note that
This means that when this data is streamed as configuration or telemetry information, that the ordering of list elements must be maintained in order for the view to be meaningful. This is in order to satisfy two properties:
While it is mostly straightforward for tools such as ygot to support configuring sequential list elements and receiving device updates that respect update ordering, leaf-based gNMI telemetry systems (e.g. gnmi cache which depends on scalar TypedValue) may not support the ordering of map elements (lists are represented as Go maps and thus iteration order is random, and gnmi cache’s coalescing queue may furthermore reorder updates). As a result, it is currently impossible for clients to such a network management system to see meaningful snapshots of, for example, a single BGP policy configuration. This document contains a request for comments on using telemetry-atomic, or a similar extension to model a single BGP policy’s set of ordered statements is a way of solving this issue (modelling is described in detail in the sections below). In this solution, the entire set of leaves under the /routing-policy/policy-definitions/policy-definition/statements container subtree will be serialized into a single Important NoteThe whole set of policies as defined by /routing-policy/policy-definitions is an unordered list (they are referenced by import/export/call policies via leaf-lists of leafrefs) and is therefore not subject to this problem. Document GoalWithout using a behavior similar to atomic, then metadata must be added to indicate ordering or completeness of the ordered list data. Since it is preferable from a management tooling perspective to avoid having to process metadata, and changes to gnmi.proto itself can be very disruptive, it is this document’s goal to understand the practical implications of this solution before seriously considering an alternative approach. Proposed Model@@ -1022,8 +1031,17 @@ module openconfig-routing-policy {
"Top-level grouping for the policy statements list";
container statements {
+ oc-ext:telemetry-atomic;
description
- "Enclosing container for policy statements";
+ "Enclosing container for policy statements.
+
+ Note: in order to support scalar-based telemetry, policy statements are
+ treated as a whole instead of individually. Per the telemetry-atomic
+ extension, this means both configuration and telemetry must be sent in
+ whole and never as single policy statements. When scalar values are
+ provided, it is expected that when examining the streamed paths in
+ order, that the ordering of every new path key reflects the order in
+ the configuration.";
list statement {
key "name"; Pros
Cons
|
A few items were mentioned at the Thursday call for May:
To give an easy example, you can update a single policy statement to change a referenced prefix-set in one gnmi operation. If the prefix-set itself includes "add 2M of prefixes", having an atomic operation on the policy statements itself isn't helpful. What you want is that the totality of the policy from the perspective of the receiver to be whole. This means all indirectly referenced elements. From a logical standpoint, this would imply something like an "atomic" around all of policy - if we could guarantee that everything that's referenced is present. However, that's unlikely to happen. Fundamentally, the issue becomes that streaming telemetry in the current form of gNMI is probably the Wrong Tool for monitoring policy, or really anything requiring high levels of consistency to achieve a consistent view of a set of related objects. Small things like individual routes can achieve this through dependency graphs and deeper control of on the wire ordering of operations, but gNMI doesn't provide strong guidance about this as best I understand things. In IETF, there's the concept of "yang patch". Effectively, a flavor of this is what is desired here:
For very large change sets, streamed changes for the diff may be larger than simply retrieving the entire set of covered data from scratch. Personally, I would suggest the following things:
The better answer here in the absence of a diff/patch mechanism is "don't do this". |
Just want to update this PR since it's been pending to save time for future readers:
|
Change Scope
This change is backwards-incompatible for configuration and telemetry below the
telemetry-atomic
node, and for any telemetry on BGP policy statements on the server side.This change fixes an ordering issue when configuring or streaming telemetry using scalar TypedValue types in gNMI. The issue is that it is currently not possible to indicate policy statement ordering when data is not streamed together as part of a single SubscribeResponse.
When clients and implementors provide scalar values in gNMI for this container, it is expected that when examining the streamed paths in order, that the ordering of every new path key reflects the order in the configuration.