Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: core builtin extensions #943

Open
cueckoo opened this issue Jul 3, 2021 · 16 comments
Open

Proposal: core builtin extensions #943

cueckoo opened this issue Jul 3, 2021 · 16 comments
Labels

Comments

@cueckoo
Copy link
Collaborator

cueckoo commented Jul 3, 2021

Originally opened by @mpvl in cuelang/cue#943

Definitions

Before we introduce some of the proposed builtins, we formally introduce some as-of-yet undocumented language features.

Functions

We propose cue supports named argument functions and calls to “structs” as a shorthand for the common macro pattern (e.g. (s & { _, a: x}).out).

A function argument is now defined as:

	Argument       = [ identifier ":" ] Expression .

Any named argument must be followed by other named arguments.

The expression s(a: x, b: y), where s is a struct, is now a shorthand for s & {_, a: x, b: y).

Validator

A validator is a special builtin that is evaluated by unifying it with other values whereby the result is one of a few outcomes:

  • pass: returns _ if the validation is successful and making the value with which it was unified more specific does not change this result (or it is a final evaluation).
  • incomplete error: the validation failed, but making the value with which it was unified more specific could change this result
  • fatal error: the validation failed and making the value more specific cannot change this result.

A validator must be run at the last stage of evaluating a node, after a fixed point is reached evaluating all all non-validator values, in which case any error is considered a fatal error. A validator may be run at earlier stages of the evaluation of a node, in which case an incomplete error signifies that the decision on validity must be postponed.

An example of a language-level validator is <10. struct.MinFields and struct.MaxFields are examples of validators of builtin packages.

Validators can be thought of as a Go function that has an error return signature.

Inferred validators

Optional: Builtin functions that have the signature foo(x1, x2, …, xn) bool may be implicitly interpreted as validators of the signature foo(x2, …, xn) error.

The CUE function notation

We define the following signature format for cue functions:

FunctionDecl   = identifier Arguments "::" Expression .   
Arguments      = "(" [ Argument { "," Argument } [ "," ] ] ")" .
Argument       = [ identifier ":" ] Expression .

Either all or none of the arguments should be named.

The following rules apply for calling functions with this signature:

  • An argument with a default value in its expression may be omitted in a call. All other arguments must be present in a call.
  • A call must either have all named, or all unnamed arguments. This could be rel

These rules could be relaxed later.

Proposed builtins

builtins to replace _|_ (bottom)

Although _|_ is part of the standard CUE idiom, it has several issues:

  • no ability to associate user-defined message to bottom
  • meaning of comparison against bottom is unclear
  • the symbol looks offensive to some

We intend to deprecate the bottom symbol (keeping it around for backwards compatibility) and replace it with builtins that clearer conveys the intent of its usage.

Comparison is not supported by the spec (arguably), but it is a crucial piece of functionality for many CUE configurations. The meaning of it is unclear, however. In many cases, it is used to check whether a reference exists. In some cases, however, the intended meaning is to check that a value is valid. In reality, CUE implements a semantic that is somewhere in between the two cases: it checks the validity of a value, but not recursively.

Note that if any of these builtins return false, they may still be satisfied at a later point in time. Evaluation should take this into account, as usual.

_|_ replacement: error(msg: string | *null) :: _|_

The use of error(msg) replaces the common use of _|_ with the added ability to associate a user message with an error. When used within a disjunction, the error will get eliminated as usual, but upon failure of the disjunction, the user-supplied error is used as an alternative error message.

Comparison to bottom

Uses of comparison against bottom will need to be replaced with one of the following builtins.

isconcrete(expr) :: bool

isconcrete reports whether expr resolves to a concrete value, returning true if it does and false otherwise. It is a fatal error if an expression can never evaluate to true.

Example:

a: {}
b: int

c: isconcrete(a)   // true
d: isconcrete(b)   // false
e: isconcrete(a.b) // false(b could still be defined)
f: isconcrete(b.c) // fatal error (b.c can never be satisfied)

Purpose: replaces if a.foo != _|_ {, where it is checked whether a.foo exists with the purpose of determining whether it is a concrete value.

exists(expr) :: bool (optional)

exists reports whether expr resolves to any value.

Example:

a: {}
b: int

c: exists(a)   // true
d: exists(b)   // true
e: exists(a.b) // false (b could still be defined)
f: exists(b.c) // fatal error (b.c can never be satisfied)

opt?: int
ref:  exists(opt)  // false considered to be non-existing.

req!: int
ref:  exists(req)  // false

Purpose: replaces if a.foo != _|_ {, where it is checked whether a.foo exists regardless of concreteness.

validator builtins

must(expr: _, msg: string | *null) :: _

must(expr) passes if expr evaluates to true and fails otherwise.

Must can be used to turn arbitrary expressions into constraints. For instance, a: <10 can be written as a: must(a < 10). See Issue #575 for details

not(expr) :: _

not(expr) passes if unified with a value x for which expr&x fails and false otherwise.
See #571 for details.

Examples:

a: not(string) // number | bytes | {...} | [...] | bool | null

numexist(count, ...expr) :: _

numexist(count, ...expr) passes if the number of expressions for which exists(x) evaluates to true unifies with count.

The main purpose of numexist is to indicate mutual exclusivity of fields.

#X: {
    // either foo or bar may be specified by the user
    numexist(<=1, foo, bar)
    foo?: int
    bar?: int
}

numconcrete(count, ...expr) :: _ (optional)

numconcrete(count, ...expr) passes if the number of expressions for which isconcrete(x) evaluates to true unifies with count.

numvalid(count, ...expr) :: _ (optional)

numvalid(count, ...expr) passes if the number of expressions for which isvalid(x) evaluates to true unifies with count.

Builtins related to concrete values

Purpose: combine schema of different instances of the same package that would otherwise fail because there are conflicting definitions.

manifest(x) :: _

manifest evaluates x stripping it of any optional fields and definitions and disambiguating disjunctions after their removal.

Use cases:

  • combine instances that only differ in templates.

Defining ranges

Looking around at other languages, defining range numbers clearly is a hard problem, as it is often not clear from just looking at the syntax, or even wording, whether or not ranges are inclusive.

CUE’s unary comparators provide a possible solution to this issue.

range(from: int, to: int, by: int | *1) :: [...int]

Builtin range returns a stream of values, starting from from (must be concrete) , adding by (defaults to 1) as long as unification with to succeeds. It is an error to define a range that never terminates.

Examples:

range(from: 1, to: <10)              // [1, ..., 9]
range(from: 1, to: >=0.5, by: -0.1)  // [1, 0.9, ..., 0.5]
range(from: 1, to: <1)               // []
range(from: 1, to: >=1)              // error("infinite range")

Switching

CUE’s if is not paired with an else. This is partly because if really is a comprehension. But another reason is that the use of else quickly leads to nested conditions. A switch statement is generally more conducive to readability in this case.

A switch statement can be simulated in CUE using lists:

choice: [
    if a { x },
    if b { y },
    z,
][0]

is equivalent to the hypothetical

choice: if a { x } else { if b { y } else { z } }

The issue is that the hidden [0] at the end of the switch is impairing readability.

head

A head builtin could make the above more readable. It would do nothing more than select the first element in a list, but doing so by more clearly signaling the intention at the start of the list.

choice: head([
    if a { x },
    if b { y },
    z, // default
])

Package std

We’re considering making all core builtins available under the package std, so that they can be referenced unambiguously and more clearly than using the __ prefix.

import “std”

a: std.range(from: 3, by: -1, to: >0) // 2, 3, 1
@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @seh in cuelang/cue#943 (comment)

This is so good to see.

One problem to consider with the "Switch" section: You write, more or less, if a {} else if b {} ..., but quite frequently b is !a or not a, which requires restating a. Could let help here to define the result of a once, and express it being both true for the consequent branch and its negation for the alternate branch?

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @seh in cuelang/cue#943 (comment)

Also, while head is evocative, it does so little that it barely justifies its inclusion. I thought of coalesce as a good name for picking the first suitable item in a sequence that can accommodate "null" or disqualified values. Against that, though, in your "Switch" example, I suppose the list should never wind up with more than one value, as opposed to it being prefixed by any number of "null" values.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#943 (comment)

@seh: yes, let could be used here that way, though outside the list. We could perhaps consider allowing let in lists.
Also, one could mimic this behavior with: head([if a {}, {}]), where the second element is the "default", and thus !a`.

Regarding head: I agree its utility is a bit meager. We did consider a select builtin which I think is close to what you're proposing, where it would pick the first of any valid entry. The main problem with this pattern seems that it will be too easy to ignore potential errors, so it may be a less safe approach. Having said that, it reads quite nice and we have seen configurations where this would have merit. So it is something to consider. It just seemed safer to see how far one would get with this seemingly safer approach.

I'm not sure I understand the point with the null values, but maybe this answers your question.

Do you think adding head is not warranted and using a [...][0] pattern is sufficient?

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @seh in cuelang/cue#943 (comment)

I was not sure that CUE has the same notion of "null" values that SQL, HCL, Jsonnet, and other languages have, so the semantics of a hypothetical coalesce function might not apply.

I don't think head is warranted without tail (or rest), and perhaps nth. My Lisp is showing. I haven't yet reached for any functions like that, though. I'd rather spend those tokens on set manipulation functions for lists.

Would it be possible to write a CUE "function" that encapsulates your [if a {consequent}, {alternate}][0] technique? It would require at least two inputs; the alternate could be optional. It's not much compression, but might cut down on the "syntactic noise" with those brackets. Yes, I confess that I'm still looking for else.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#943 (comment)

@seh: you can do else with the switch approach and I’m not in favor of a dedicated If-else construct, as it encourages bad patterns.

But I see your points otherwise. I guess you could indeed express this as cue macros neatly if we had the call shorthand. head would then be defined as:

head: { #0[0], #0: […] }

One problem is that the first element cannot have a conflicting definition of #0.

But maybe this is enough for now to just point out the pattern and suggest that people comment the construct:

aSwitch: [ // select first match
   if a { … },
   if b { … },
   c // default
][0]

anIfElse: [ // if then else
   if a { … },
   c // else
][0]

This would not require any additions to the language and we can get some experience to see what works. The query addition may also provide useful patterns that obviates the need for this.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#943 (comment)

@seh in CUE, bottom (incomplete errors,
to be more specific) is a bit like null in those languages. null can mean various things, often not compatible with the notion of null here. So it seemed impossible to assign any specific meaning to it.

@myitcv
Copy link
Member

myitcv commented Jul 29, 2021

Noting that one use case of comparison against _|_ we should explicitly document (I'm not totally clear it is actually covered above) is that of type assertion, as discussed in #1161.

@nyarly
Copy link

nyarly commented Sep 23, 2021

Perhaps this warrants a new discussion or feature issue, but one thing I've found lacking in cases where I've wanted something like the list-as-choice pattern at the end there comes from FP paradigms doing pattern matching. Specifically, language support for guaranteeing that the options are exhaustive.

Reflecting here that the list comprehension version of this provides that in a roundabout (and at-runtime) way: if all the alternatives fail, the list is empty and the index will be out of bounds. So there's some safety railing there.

But: the user is going to get an "index out of bounds" error, (which is confusing when the cause is that an alternative was overlooked), and it'll be the user of the CUE program and not its author who gets the error.

It would be fantastic to have a language level match operator that could, at parse time, emit something like no alternative matches <16, >20 or something. There may be a correspondingly fantastic level of effort to provide that feature, but it sure would be nice.

@verdverm
Copy link

The default seems to prevent the out of bounds issue, assuming it is always required. One could use error in the case the default should fail the config and provide a more meaningful message.

It may be useful to know that something like <16 | >20 cannot be validated at parse time and requires the evaluator to do its thing ("runtime" in your message, though I'm not sure that is the most accurate term)

It might be also worth considering that, in many ways, CUE comes from Go and there is value in minimizing language features and syntax.

@verdverm
Copy link

verdverm commented Jan 6, 2022

What about an operator for subsumes?

like if subsumes(a, b) { "a subsumes b" }

I'm trying something like

t: int

result: [
  if (t & int) == _|_ { "int" },
  if (t & int64) == _|_ { "int64" },
  if (t & int32) == _|_ { "int32" },
  if (t & int8) == _|_ { "int8" },
  "unknown",
][0]

which won't work, I think something like this might

t: int

result: [
  if subsumes(t, int)   { "int" },
  if subsumes(t, int64) { "int64" },
  if subsumes(t, int32) { "int32" },
  if subsumes(t, int8)  { "int8" },
  "unknown",
][0]

The goal of the example is to turn CUE types into a string, maybe there could be a builtin or stdlib package that helps with that in a more targeted way. A subsume builtin might still be useful more generally

@sdboyer
Copy link

sdboyer commented Jan 6, 2022

Strong +1 to that - a native subsumption operator is a key roadmap item for thema (née scuemata). For now, the necessary enforcement of a subsumption relation has to be done in Go. (Though that doesn't work either because of a panic that i need to post an issue for, once i have a clear reproduction)

@haydenflinner
Copy link

What's the general status on the extensions discussed here? I'm particularly interested in functions.

@myitcv
Copy link
Member

myitcv commented Jul 11, 2023

Noting that we should also consider downcasts #454 in scope of new builtins.

@myitcv
Copy link
Member

myitcv commented Mar 12, 2024

Noting what I think is a tricky edge case here:

#X: {
    // at least one of foo or bar must be specified by the user
    numexist(>0, foo, bar)
    foo?: int
    bar?: int
}

The definition #X itself will be in error in this case.

@vergenzt
Copy link

vergenzt commented Apr 2, 2024

I came here from https://cuetorials.com/patterns/functions and am especially interested in the functions syntax, but I have yet to have a use case for any of the other proposals.

Should some of these use cases be split out into different issues? I feel like there's a lot being proposed here. It might make it clearer which features are priorities to users if these were separate issues.

@myitcv
Copy link
Member

myitcv commented May 22, 2024

I have just created #3165 for further discussion regarding the encoding of oneofs in CUE.

cueckoo pushed a commit that referenced this issue Oct 14, 2024
This prepares for both adding new buitlins (such as
the proposed numExist et. al.) as well as
adjusting some exiting ones, like `and`.

This CL is supposed to be a no-op (aside from adding
the functionality) and we separate it out to make
future diffs smaller. We will test RawFunc itself
with the respective builtins.

The issue with `and`, for instance, is that it
"weaves" in partially evaluated expressions into
existing evaluation. In come cases this may lead
to cycles. To prevent this, there needs to be a
back channel from the function to the evaluator.
Only the function can know exactly which cycle
information is needed.

Other uses are functions like `numExists` or any
other builtin that needs to operate on CUE expressions
rather than values.

Issue #943

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: I32ef92bfdc2a8318b00801bc067df4a073a10a73
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1202442
Reviewed-by: Matthew Sackman <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Unity-Result: CUE porcuepine <[email protected]>
vanhtuan0409 pushed a commit to anduintransaction/cue that referenced this issue Oct 15, 2024
This prepares for both adding new buitlins (such as
the proposed numExist et. al.) as well as
adjusting some exiting ones, like `and`.

This CL is supposed to be a no-op (aside from adding
the functionality) and we separate it out to make
future diffs smaller. We will test RawFunc itself
with the respective builtins.

The issue with `and`, for instance, is that it
"weaves" in partially evaluated expressions into
existing evaluation. In come cases this may lead
to cycles. To prevent this, there needs to be a
back channel from the function to the evaluator.
Only the function can know exactly which cycle
information is needed.

Other uses are functions like `numExists` or any
other builtin that needs to operate on CUE expressions
rather than values.

Issue cue-lang#943

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: I32ef92bfdc2a8318b00801bc067df4a073a10a73
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1202442
Reviewed-by: Matthew Sackman <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Unity-Result: CUE porcuepine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants