Replies: 3 comments
-
this is really the way to go |
Beta Was this translation helpful? Give feedback.
-
From first glance it looks like Cue also enables distributed policy delegation. Because it is based on lattices you can ensure one never delegates more "rights" than one possess. I'm not so sure this is possible with Rego. I have to look into this in more detail though. |
Beta Was this translation helpful? Give feedback.
-
I came across this project while looking for a solution as described in this discussion: https://github.com/k-cloud-labs/kinitiras/tree/main |
Beta Was this translation helpful? Give feedback.
-
Originally opened by @mpvl in cuelang/cue#818
CUE is an ideal language for admission control. This discussion is a thought experiment by means of example how this would look like as a frontend to Gatekeeper (Open Policy Agent), which currently uses Rego.
I have to admit I still don’t fully understand Rego, like the relevance of the names of entry points and some of the syntax. So if the mappings are off, that’s the reason. I’ll write up my assumptions about what the Rego means.
Examples
We start with an example where Rego and CUE are almost identical and increase complexity along the way.
Setting restrictions
We take the example of setting a per-user memory limit in https://www.openpolicyagent.org/docs/latest/policy-language/.
We want to set different limits, depending on the type of user. The Rego example:
I’m not entirely certain how
max_memory
works here, whether it only can be assigned to once, or whether it unifies the result (like CUE, Rego is a logic programming language). But the intent is clear at least: the value may not be set to both 32 and 4.Now let’s take a look how the same would look in CUE:
In CUE, both the input data and constraints are defined in plain JSON values (with CUE syntactic sugar applied). The conditional values for
max_memory
are set with an if-clause (a comprehension), which “embed” the constraint if the condition is true. If both conditions match, there is a conflict (32 != 4). If none of the conditions match,max_memory
remains unspecified.Note that CUE is a bit more verbose here. A philosophy of CUE is that configuration languages should focus on readability over writeability, even more so than programming languages.
Settings restrictions, an alternative
The above example assumed there is no information kept already about users. All information was defined in the code snippet. In practice, it is likely that
max_memory
is a field that exists somewhere in an API or database.Suppose there is a notion of per-user settings, that is a map of users to user settings (in CUE):
In Rego, the code would look roughly the same: one has to pull the information from this location locally in order to apply the logic.
In CUE, one would approach it differently: instead of pulling values to one place, constraints are pushed out to where the data resides:
The square bracket notation is used in CUE to conditionally apply a constraint if the pattern in the square brackets matches a field. Users of common JSON query languages will note the symmetry with querying:
a.[_]
means match any field ina
, whilea: [_]: v
means applyv
to any field ina
. Instead of_
, CUE allows any kind of pattern, including enums (“foo” | “bar”), or regular expressions.This example shows one big benefit of CUE: the lines under the “general” comment, are not just policy. They could easily be moved to an overall configuration, where they can be used to validate configuration outside of admission control, used as a template to reduce boilerplate in configuration, or serve as user-specific OpenAPI generation, to name a few examples.
In general, CUE’s approach has several advantages:
Complete example
I’m a bit mystified by Rego entry points and naming, but I’m assuming the following for this example:
Gatekeeper passes a value to Rego with the following values from root:
input.request
: the incoming request that needs to be validatednamespaces
: some information about namespacesThe Rego
Consider the following Rego (taken from the OPA website):
I’m a bit confused about what
n_patterns_parts
andn_str_parts
does here. I suspect they could be omitted.CUE
In CUE, a straightforward translation could be written as
There is no
deny
entrypoint, as in Rego. Instead the entire CUE definition is treated as a value and matched against the Gatekeeper input, failing if there is a conflict. If a field starts with underscore (_foo
) it means it is “hidden” and not part of the output.Again here, the restriction that a
host
must conform to a valid, allowed host is “pushed” to wherehost
is defined. This makes this logic reusable across API, policy definition and configuration validation. For example, we could defineValidHost
in a separate package and import it here and also use it for API definition, OpenAPI generation, or what have you.Maximum reuse with the proposed query extension
Using the constructs in proposal #165, this would be simplified further. Suppose we define an
IngressCreator
type:A policy with maximal reuse (assuming that namespaces can be imported from a package) would then look like
This relies on three proposed language extensions:
[field: value]
allows matching also field values, not just field names.[int]
pattern match list values. This makes CUE more symmetric and reads a bit easier than[...T]
.foo(x)
macro shorthand: CUE does not have functions (for good reasons), but a limited form of functions can be simulated using structs. The proposed notation makes this easier, without compromising on introducing functions.The pattern matching in square brackets (
[_: IngressCreator]
) has been extended to not only match field names, but also field values. Field names are matched as before, while field values must be an instance of the filter value (hereIngressCreator
).The
IngressCreator
can now be used elsewhere as a constraint or type. For instance, it could be used in an API:In other words, using a single language not only allows for a consistent notation across IDLs, data, API definitions and policy specifications, it also allows for reuse across these domains.
Comparison
Return values
The examples above assume that admission control only involves checking for errors. Gatekeeper, however, allows the result of a policy check to be any value.
This can easily be supported in CUE by, say, adopting the convention of
#out
(a definition) containing a result value.Error messages
In Rego one typically sees user-defined messages. This often results in better messages. Note that with the
must
anderror
proposals of CUE the same can be achieved.CUE’s “push” approach has a big advantage here though: because the constraints are pushed down to relevant values, the location of the constraint violation already conveys useful information that otherwise can easily get lost in computation.
Embedding policy
In a CUE-only world, it would be possible to write CUE that combines data and policy:
The advantage of this is obvious: as the CUE definition is native, all error checking and validation is preserved.
In practice, one will probably not want to send around CUE. One could imagine writing something like this (
cue.Marshal
andrego.Marshal
not yet supported).:or
This still preserves the validation.
Policy types and constraints
Because in CUE constraints are just values, it is possible to define validation rules on top of policy (meta validation rules, if you will). For instance, let’s say that in the above example (embedding CUE), we want to ensure that the policy we pass conforms to the Gatekeeper format for which it is configured. We define the Gatekeeper message as such.
we can now define a template for the above message that ensures a policy if of the above form:
This “template” can then be applied to
for instance by running
cue export schema.cue instance.cue --out yaml
, to obtain the original example.High-level observations of benefits of using CUE for Policy
One language for everything
The most obvious benefit to using CUE over Rego is that CUE is more widely deployable. CUE allows defining data (it is a JSON superset after all), APIs, validation rules, and policy.
This benefit is bigger than just having to learn one language. Instead of “embedding Rego in YAML”, one can use a single file to define both. In the end, a policy is just a value to CUE. This way OPA/GateKeeper wouldleverage the wider CUE ecosystem, and vice versa.
But it goes further, because a policy is just a value, CUE can go fully meta: one can easily define a policy that defines what are valid policies.
CUE’s tooling like trim is agnostic to the CUE values. It can equally be used for refactoring policy definitions.
Having a single language also has the benefit that users can import these policies more easily in their (CUE) data or validation files without having to worry about another language. Doing the same for a situation that mandates the use of two languages is inherently more complicated.
Conceptual simplicity
CUE has a simple analogy to the real world: a spreadsheet for JSON data.
In a spreadsheet, one has a matrix of cells. After a successful evaluation, all fields have concrete values. The power of a spreadsheet, however, is that one can express values in terms of other cells. Spreadsheets also allow you to add validation, for instance giving a field a different color based on different values.
Analogously, the only concrete value equivalent of CUE is a JSON file. Continuing the spreadsheet analogy, CUE allows replacing values (a cell) with formulas expressing these values in terms of other cells and/or to add validation rules to these cells.
In CUE constraints as well as entire configurations are themselves values. An analogy in spreadsheet is that one can copy-paste a matrix of cells into a single cell, which then spreads across many.
The spreadsheet model is often better understood by people than relational programming (Rego).
Why CUE results in smaller more natural policy definitions
CUE and Rego both have their roots in logic programming. Rego is a Datalog derivative, which, in turn, is a derivative of Prolog. So both languages benefit from some of the common properties of logic programming languages (as opposed to functional or imperative), like improved composability, omni-directionality, and a sound view on the underlying value lattice.
Conceptually, though, they are quite different.
Some history
CUE’s ancestor languages (e.g. LinGO, used in NLP) were designed in response to the complexities and limitations of using Prolog in a large-scale engineering setting. Instead of relations, it defines constraints that map one-to-one to the underlying data representation. The key benefit was that constraints, in one swoop, allow for fine-grained validation, templates removing boilerplate, and logical inference. The result was more modularity and allowing engineering at a scale seemed impossible with Prolog. It became considerably easier for people to contribute, especially in large-scale engineering settings.
How does Rego work conceptually?
Datalog, on which Rego is based, is essentially a query language. Queries are defined as relations over data.
The following steps roughly define how to define a constraint on the field values A and B:
How does CUE work conceptually?
CUE is a constraint-based language. In essence, CUE defines a unified continuum of all possible configurations, in which one can define a taxonomy of configuration. An API is like a Go struct, saying which fields exist and which types they must have. A policy, or validation, defines what are valid values of these fields, for instance, that field
min
should be less than fieldmax
. Such a definition is said to be an instance of such an API. At the other end are concrete configurations. To extend this analogy, these are Go values of the original struct. A value that conforms to a policy is an instance of that policy.The ability to map constraints directly onto the underlying data means that instead of having a “pull data, apply constraint” approach, one just pushes the constraint directly, often without the need of temporary helpers variables.
Why CUE will result in more reuse
One obvious reason why CUE allows for more reusability is that it is a more widely-applicable language.
Another reason lies in its constraint-based approach: because the convention is to define constraints in terms of the underlying data and to map it directly onto them, the underlying data acts as a contract for representing such constraints. This means that, automatically, these constraints can be reused wherever the same data types are used.
Beta Was this translation helpful? Give feedback.
All reactions