Using CUE for policy definition and admission control #818

cueckoo · 2021-07-03T10:34:28Z

cueckoo
Jul 3, 2021
Collaborator

Originally opened by @mpvl in cuelang/cue#818

CUE is an ideal language for admission control. This discussion is a thought experiment by means of example how this would look like as a frontend to Gatekeeper (Open Policy Agent), which currently uses Rego.

I have to admit I still don’t fully understand Rego, like the relevance of the names of entry points and some of the syntax. So if the mappings are off, that’s the reason. I’ll write up my assumptions about what the Rego means.

Examples

We start with an example where Rego and CUE are almost identical and increase complexity along the way.

Setting restrictions

We take the example of setting a per-user memory limit in https://www.openpolicyagent.org/docs/latest/policy-language/.

We want to set different limits, depending on the type of user. The Rego example:

# Define user "bob" for test input.
user := "bob"

# Define two sets of users: power users and restricted users. Accidentally
# include "bob" in both.
power_users := {"alice", "bob", "fred"}
restricted_users := {"bob", "kim"}

max_memory = 32 { power_users[user] }
max_memory = 4 { restricted_users[user] }

I’m not entirely certain how max_memory works here, whether it only can be assigned to once, or whether it unifies the result (like CUE, Rego is a logic programming language). But the intent is clear at least: the value may not be set to both 32 and 4.

Now let’s take a look how the same would look in CUE:

user:             "bob"
power_users:      ["alice", "bob", "fred"]
restricted_users: ["bob", "kim"]

max_memory: int
if list.Contains(power_users, user) {
    max_memory: 32
}
if list.Contains(restricted_users, user) {
    max_memory: 4
}

In CUE, both the input data and constraints are defined in plain JSON values (with CUE syntactic sugar applied). The conditional values for max_memory are set with an if-clause (a comprehension), which “embed” the constraint if the condition is true. If both conditions match, there is a conflict (32 != 4). If none of the conditions match, max_memory remains unspecified.

Note that CUE is a bit more verbose here. A philosophy of CUE is that configuration languages should focus on readability over writeability, even more so than programming languages.

Settings restrictions, an alternative

The above example assumed there is no information kept already about users. All information was defined in the code snippet. In practice, it is likely that max_memory is a field that exists somewhere in an API or database.

Suppose there is a notion of per-user settings, that is a map of users to user settings (in CUE):

users: [string]: { max_memory: int, ... }

In Rego, the code would look roughly the same: one has to pull the information from this location locally in order to apply the logic.

In CUE, one would approach it differently: instead of pulling values to one place, constraints are pushed out to where the data resides:

// ensure bob exists
user: "bob"
users: "\(user)": _ // assert the user is defined

// general
power_users:      ["alice", "bob", "fred"]
restricted_users: ["bob", "kim"]

users: [or(power_users)]:      max_memory: 32
users: [or(restricted_users)]: max_memory: 4

The square bracket notation is used in CUE to conditionally apply a constraint if the pattern in the square brackets matches a field. Users of common JSON query languages will note the symmetry with querying: a.[_] means match any field in a, while a: [_]: v means apply v to any field in a. Instead of _, CUE allows any kind of pattern, including enums (“foo” | “bar”), or regular expressions.

This example shows one big benefit of CUE: the lines under the “general” comment, are not just policy. They could easily be moved to an overall configuration, where they can be used to validate configuration outside of admission control, used as a template to reduce boilerplate in configuration, or serve as user-specific OpenAPI generation, to name a few examples.

In general, CUE’s approach has several advantages:

code closely corresponds to, and looks like, the underlying data
generally is less verbose (see below)
often easier to understand
better reuse across the configuration ecosystem (see below)

Complete example

I’m a bit mystified by Rego entry points and naming, but I’m assuming the following for this example:

Gatekeeper passes a value to Rego with the following values from root:

input.request: the incoming request that needs to be validated
namespaces: some information about namespaces

The Rego

Consider the following Rego (taken from the OPA website):

deny[msg] {
    input.request.kind.kind == "Ingress"
    input.request.operation == "CREATE"
    host := input.request.object.spec.rules[_].host
    not fqdn_matches_any(host, valid_ingress_hosts)
    msg := sprintf("invalid ingress host %!q(MISSING)", [host])
}

valid_ingress_hosts = {host |
    allowlist := namespaces[input.request.namespace].metadata.annotations["ingress-allowlist"]
    hosts := split(allowlist, ",")
    host := hosts[_]
}
fqdn_matches_any(str, patterns) {
    fqdn_matches(str, patterns[_])
}
fqdn_matches(str, pattern) {
    pattern_parts := split(pattern, ".")
    pattern_parts[0] == "*"
    str_parts := split(str, ".")
    n_pattern_parts := count(pattern_parts)
    n_str_parts := count(str_parts)
    suffix := trim(pattern, "*.")
    endswith(str, suffix)
}
fqdn_matches(str, pattern) {
    not contains(pattern, "*")
    str := pattern
}

I’m a bit confused about what n_patterns_parts and n_str_parts does here. I suspect they could be omitted.

CUE

In CUE, a straightforward translation could be written as

namespaces: {} // passed by Gatekeeper?
input: request: {
    if kind.kind == "Ingress" && operation == "CREATE" {
        object: spec: rules: [...{ host: _validHost }]
    }
    _annotations: namespaces[request.namespace].metadata.annotations
    _allowed:     [ for x in strings.Split(_annotations["ingress-allowlist"], ".") { strings.Trim(x, “*.”) }]
    _validHost:   or([ for x in _allowed { strings.HasSuffix(x) }]) // HasSuffix used as validator
}

There is no deny entrypoint, as in Rego. Instead the entire CUE definition is treated as a value and matched against the Gatekeeper input, failing if there is a conflict. If a field starts with underscore (_foo) it means it is “hidden” and not part of the output.

Again here, the restriction that a host must conform to a valid, allowed host is “pushed” to where host is defined. This makes this logic reusable across API, policy definition and configuration validation. For example, we could define ValidHost in a separate package and import it here and also use it for API definition, OpenAPI generation, or what have you.

Maximum reuse with the proposed query extension

Using the constructs in proposal #165, this would be simplified further. Suppose we define an IngressCreator type:

IngressCreator: {kind: kind: “Ingress”, operation: “CREATE”}

A policy with maximal reuse (assuming that namespaces can be imported from a package) would then look like

import "acme.com/mytypes"

input: request: [_: mytypes.IngressCreator]: object: spec: 
    rules: [int]: host: mytypes.ValidHost(input.request.namespace)

This relies on three proposed language extensions:

[field: value] allows matching also field values, not just field names.
[int] pattern match list values. This makes CUE more symmetric and reads a bit easier than [...T].
foo(x) macro shorthand: CUE does not have functions (for good reasons), but a limited form of functions can be simulated using structs. The proposed notation makes this easier, without compromising on introducing functions.

The pattern matching in square brackets ( [_: IngressCreator]) has been extended to not only match field names, but also field values. Field names are matched as before, while field values must be an instance of the filter value (here IngressCreator).

Note: If this looks like aspect-oriented programming to you, then you’re correct: the value in the square bracket is the pointcut and the value to the right of it is the advice (the constraint in CUE’s case).

The IngressCreator can now be used elsewhere as a constraint or type. For instance, it could be used in an API:

#MyStruct: {
    ingressCreator: mytpes.IngressCreator
}

In other words, using a single language not only allows for a consistent notation across IDLs, data, API definitions and policy specifications, it also allows for reuse across these domains.

Comparison

Return values

The examples above assume that admission control only involves checking for errors. Gatekeeper, however, allows the result of a policy check to be any value.

This can easily be supported in CUE by, say, adopting the convention of #out (a definition) containing a result value.

Error messages

In Rego one typically sees user-defined messages. This often results in better messages. Note that with the must and error proposals of CUE the same can be achieved.

CUE’s “push” approach has a big advantage here though: because the constraints are pushed down to relevant values, the location of the constraint violation already conveys useful information that otherwise can easily get lost in computation.

Embedding policy

In a CUE-only world, it would be possible to write CUE that combines data and policy:

import "acme.com/mytypes"

spec: crd: spec: names: kind: "K8sUniqueServiceSelector"

targets: [{
    target: "admission.k8s.gatekeeper.sh"

    // The policy from the last example.
    cue: input: request: [_: mytypes.IngressCreator]: object: spec: 
         rules: [int]: host: mytpes.ValidHost(input.request.namespace)
}]

The advantage of this is obvious: as the CUE definition is native, all error checking and validation is preserved.

In practice, one will probably not want to send around CUE. One could imagine writing something like this (cue.Marshal and rego.Marshal not yet supported).:

import (
    "encoding/cue"

    "acme.com/mytypes"
)

spec: crd: spec: names: kind: "K8sUniqueServiceSelector"

targets: [{
    target: "admission.k8s.gatekeeper.sh"

    "cue": cue.Marshal(#cue)
    #cue: input: request: [_: mytypes.IngressCreator]: object: spec: 
         rules: [int]: host: mytpes.ValidHost(input.request.namespace)
}]

or

import (
    "encoding/rego"

    "acme.com/mytypes"
)

spec: crd: spec: names: kind: "K8sUniqueServiceSelector"

targets: [{
    target: "admission.k8s.gatekeeper.sh"

    "rego": rego.Marshal(#cue)
    #cue: input: request: [_: mytypes.IngressCreator]: object: spec: 
         rules: [int]: host: mytpes.ValidHost(input.request.namespace)
}]

This still preserves the validation.

Note: converting CUE to Rego should be in the realm of possibilities, as long as no builtins are used (perhaps a small set could be supported). Converting Rego to CUE should also be possible, but will likely be considerably harder.

Policy types and constraints

Because in CUE constraints are just values, it is possible to define validation rules on top of policy (meta validation rules, if you will). For instance, let’s say that in the above example (embedding CUE), we want to ensure that the policy we pass conforms to the Gatekeeper format for which it is configured. We define the Gatekeeper message as such.

#HostCheck: {
    input: request: #MyRequest // #MyRequest is the type of the underlying error
    namespaces: #Namespace     // #Namespace defines the fields of the namespace db
}

we can now define a template for the above message that ensures a policy if of the above form:

// schema.cue
spec: crd: spec: names: kind: "K8sUniqueServiceSelector"

targets: [...{
    target: "admission.k8s.gatekeeper.sh"
    "rego": rego.Marshal(#cue)
    #cue:   #HostCheck
}]

This “template” can then be applied to

// instance.cue
targets: [{
    #cue: input: request: [_: mytypes.IngressCreator]: object: spec: 
         rules: [int]: host: mytpes.ValidHost(input.request.namespace)
}]

for instance by running cue export schema.cue instance.cue --out yaml, to obtain the original example.

High-level observations of benefits of using CUE for Policy

One language for everything

The most obvious benefit to using CUE over Rego is that CUE is more widely deployable. CUE allows defining data (it is a JSON superset after all), APIs, validation rules, and policy.

This benefit is bigger than just having to learn one language. Instead of “embedding Rego in YAML”, one can use a single file to define both. In the end, a policy is just a value to CUE. This way OPA/GateKeeper wouldleverage the wider CUE ecosystem, and vice versa.

But it goes further, because a policy is just a value, CUE can go fully meta: one can easily define a policy that defines what are valid policies.

CUE’s tooling like trim is agnostic to the CUE values. It can equally be used for refactoring policy definitions.

Having a single language also has the benefit that users can import these policies more easily in their (CUE) data or validation files without having to worry about another language. Doing the same for a situation that mandates the use of two languages is inherently more complicated.

Conceptual simplicity

CUE has a simple analogy to the real world: a spreadsheet for JSON data.

In a spreadsheet, one has a matrix of cells. After a successful evaluation, all fields have concrete values. The power of a spreadsheet, however, is that one can express values in terms of other cells. Spreadsheets also allow you to add validation, for instance giving a field a different color based on different values.

Analogously, the only concrete value equivalent of CUE is a JSON file. Continuing the spreadsheet analogy, CUE allows replacing values (a cell) with formulas expressing these values in terms of other cells and/or to add validation rules to these cells.

In CUE constraints as well as entire configurations are themselves values. An analogy in spreadsheet is that one can copy-paste a matrix of cells into a single cell, which then spreads across many.

The spreadsheet model is often better understood by people than relational programming (Rego).

Why CUE results in smaller more natural policy definitions

CUE and Rego both have their roots in logic programming. Rego is a Datalog derivative, which, in turn, is a derivative of Prolog. So both languages benefit from some of the common properties of logic programming languages (as opposed to functional or imperative), like improved composability, omni-directionality, and a sound view on the underlying value lattice.

Conceptually, though, they are quite different.

Some history

CUE’s ancestor languages (e.g. LinGO, used in NLP) were designed in response to the complexities and limitations of using Prolog in a large-scale engineering setting. Instead of relations, it defines constraints that map one-to-one to the underlying data representation. The key benefit was that constraints, in one swoop, allow for fine-grained validation, templates removing boilerplate, and logical inference. The result was more modularity and allowing engineering at a scale seemed impossible with Prolog. It became considerably easier for people to contribute, especially in large-scale engineering settings.

How does Rego work conceptually?

Datalog, on which Rego is based, is essentially a query language. Queries are defined as relations over data.

The following steps roughly define how to define a constraint on the field values A and B:

query the value for A and assign to a variable
do the same for B
create an expression that act as a “barriers” to pass or reject

How does CUE work conceptually?

CUE is a constraint-based language. In essence, CUE defines a unified continuum of all possible configurations, in which one can define a taxonomy of configuration. An API is like a Go struct, saying which fields exist and which types they must have. A policy, or validation, defines what are valid values of these fields, for instance, that field min should be less than field max. Such a definition is said to be an instance of such an API. At the other end are concrete configurations. To extend this analogy, these are Go values of the original struct. A value that conforms to a policy is an instance of that policy.

The ability to map constraints directly onto the underlying data means that instead of having a “pull data, apply constraint” approach, one just pushes the constraint directly, often without the need of temporary helpers variables.

Why CUE will result in more reuse

One obvious reason why CUE allows for more reusability is that it is a more widely-applicable language.

Another reason lies in its constraint-based approach: because the convention is to define constraints in terms of the underlying data and to map it directly onto them, the underlying data acts as a contract for representing such constraints. This means that, automatically, these constraints can be reused wherever the same data types are used.

gedw99 · 2022-05-23T17:17:54Z

gedw99
May 23, 2022

this is really the way to go

0 replies

mmzeeman · 2022-06-20T21:43:22Z

mmzeeman
Jun 20, 2022

From first glance it looks like Cue also enables distributed policy delegation. Because it is based on lattices you can ensure one never delegates more "rights" than one possess. I'm not so sure this is possible with Rego. I have to look into this in more detail though.

0 replies

jmalloc · 2023-11-22T08:39:06Z

jmalloc
Nov 22, 2023

I came across this project while looking for a solution as described in this discussion: https://github.com/k-cloud-labs/kinitiras/tree/main

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using CUE for policy definition and admission control #818

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Using CUE for policy definition and admission control #818

cueckoo Jul 3, 2021 Collaborator

Examples

Setting restrictions

Settings restrictions, an alternative

Complete example

The Rego

CUE

Maximum reuse with the proposed query extension

Comparison

Return values

Error messages

Embedding policy

Policy types and constraints

High-level observations of benefits of using CUE for Policy

One language for everything

Conceptual simplicity

Why CUE results in smaller more natural policy definitions

Some history

How does Rego work conceptually?

How does CUE work conceptually?

Why CUE will result in more reuse

Replies: 3 comments

gedw99 May 23, 2022

mmzeeman Jun 20, 2022

jmalloc Nov 22, 2023

cueckoo
Jul 3, 2021
Collaborator

gedw99
May 23, 2022

mmzeeman
Jun 20, 2022

jmalloc
Nov 22, 2023