Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: don't allow the introduction of new (non-hidden) definitions in a closed struct #543

Open
cueckoo opened this issue Jul 3, 2021 · 12 comments
Labels
Discuss Requires maintainer discussion Proposal roadmap/language-changes Specific tag for roadmap issue #339 x/user/dagger.io Experimental CUE-user-based labels

Comments

@cueckoo
Copy link
Collaborator

cueckoo commented Jul 3, 2021

Originally opened by @mpvl in cuelang/cue#543

Background

Currently, for regular fields, an author of a definition can make backwards-compatible modifications without fear of breaking its user, as long as users do not use this definition in an embedding. This is a great property to have for large-scale engineering.

A backwards compatible change in general is when an API is modified to be more less specific, with the exception for definitions that existing fields may not be removed, unless paired with a catch-all [string]: T or .... This is because, by default, a user may not add a field to a closed struct, so removing a field from the API will break a usage of this definition that specifies this field.

Note that a user may still add fields to an existing definitions by embedding it. In this case, changes to original definition may break its usage. There is a clear distinction, however, between using something as an embedding or not, and being able to give guarantees on the basis or whether or not a definition is used as an embedding is a clear rule. Also vet rules could warn about the use of definitions from outside the current module as an embedding.

The Problem

The problem is that this desirable property does not hold for nested definitions. Consider this definition:

#D: {
    a: int
}

A user of this template could write

foo: #D & {
    #bar: int
}

introducing a new definition in the definition #D. This is allowed because the current closedness rules do not apply to definitions.

Suppose the original definition is now changed to

#D: {
    a: int
   #bar: string
}

This now results in a breakage of foo downstream. This may be fine if both #D and foo are maintained by a single owner, but it is problematic if they are maintained by different organizations. The problem is two-fold: on the one hand the user of #D has no guarantee that the maintainer of #D will not break their usage, while the maintainer of #D cannot introduce a definition in #D without knowing it may break a user.

Note that the same problem does not occur for this snippet

import "acme.com/pkg"

foo: pkg.#D & {
    _#bar: int
}

because hidden definitions are not visible across package boundaries and live in their own namespace (NOTE: not implemented/enforced yet).

Proposal

We propose disallowing adding definitions to closed structs. In the example above, this means that

#D: a: int

foo: #D
foo: #bar: int // proposal makes this illegal

would be illegal and result in a #bar not allowed error.

The user could still write

#D: a: int

foo: {
    #D
    #bar: int
}

resulting in the same issue. However, this makes the behavior consistent with regular field and results in a single guideline upon which non-breakage guarantees can rely.

Note that users can still also write

#D: {
    a: int
}

foo: #D
foo: _#bar: int

to work around the issue as well.

Other less tangible benefits of this proposal is that conceptually there are less scenarios in how fields behave, as it makes the behavior of definitions more like that of regular fields. This may result in simpler model for the user and results in simpler code.

We propose that the ... operator does not apply to definitions.

Impact and transition

This is a breaking change. An automated rewrite is somewhat complex and would likely need to operate on the semantic level. Detecting breakage cases should be fairly straightforward. An automated rewrite will not likely be as straightforward, but seems in the realm of possibilities. It may be that it requires user interaction to decide if an offending inclusion of a definition should be rewritten as a an embedding or a hidden definition. Always rewriting it as an embedding may be the closest to the original semantics, though.

To minimize changes,

foo: #D
foo: #bar: int

could be rewritten as

foo: { #D, #bar: _ }
foo: #bar: int

This would not be too hard if position information of conflicting values is accurate.

It would be good to first implement the scoping rules of hidden fields before rolling this change out.

Discussion

Default constraints

The spec currently allows ...T to be used in structs as well (not yet implemented). It seems to not make sense to apply T to definitions and regular fields. The most likely reason why one would want to add a definition in a certain scope is because it either is a common type for multiple fields (see #limit below) or a type that is part of an expression at various points in the struct's fields. In the first case, ...T would just be applied multiple time. There seem to be very few favorable outcomes for the latter.

As it doesn't make sense to allow ...T to apply to definitions, it seems consistent to not have ... apply for definitions.

Users can use embedding to work around the constraint of adding definitions.

Possible future extensions

Note that what is discussed in this section is NOT a proposal, but rather a discussion on how the remaining limitations could be addressed in the future in a backwards compatible way.

The introduction of definitions may be useful in some cases. For instance, an author of a definition may want to base its definition on another definition, but also introduce a definition with some common patterns for its users. For instance

// amce.com/foo
package foo

#A: {
  a: int
  b: 
}
// example.com/pkg
package pkg

import "acme.com/foo"

#E: foo.#A & {
   #limit: <10 // illegal under the new proposal
  a: #limit
  b: #limit
}

Users of #E could write

import "example.com/pkg"

e: pkg.#E & { #limit: <5 }

to collectively tighten the limit of all fields.

To work around the limitations of the new proposal, the author could either use embeddings or a hidden definition. In this case, however, neither are satisfactory. It cannot be hidden, because users of the package are supposed to modify #limit and this it cannot be hidden. Using embedding brings us back to square one, as the author of #A could introduce #limit later on and break #E.

To resolve this issue, we could introduce qualified definitions that live in the namespace of the package in which they are defined. Syntactically, this could look like:

// example.com/pkg
package pkg

import "acme.com/foo"

#E: foo.#A & {
  pkg#limit: <10
  a: pkg#limit
  b: pkg#limit
}

Here pkg#limit is a broadening of the identifier syntax, where a # may be preceded not just by _, but any valid regular CUE identifier. The identifier here either refers to the handle of the current package or an imported package, fully qualifying the namespace of the definition. So in the example, pkg#limit is qualified by the tuple ("example.com/pkg", #limit).

A usage of #E would look like:

import "example.com/pkg"

e: pkg.#E & { pkg#limit: <5 }

This mechanism would allow authors to introduce definitions in a backwards compatible way without worrying about breaking things downstream or being broken by an upstream change.

This mechanism would only apply to definitions and not to regular fields. Regular fields are considered what ends up ultimately in a generated configuration and inherently represent a flat space. Definitions don't have this restriction and can benefit from the scoping results as introduced in this paragraph.

Note the close relation of scoped definitions with hidden definitions: 1) both are allowed to introduce a new definition within a closed struct (must be in the current package in both cases), 2) both have an additional qualifier before the # in a definition name, 3) in both cases the meaning of this qualifier is some fully qualified package path (which may be the current package), 4) the implementation of both relies on qualifying a field based on package names available in the current file during compile time. Note that this is mechanism is almost identical to how exported and unexported fields work in Go.

We reiterate, that the discussion of scoped definitions is missing a lot of details and is intended to be a proposal, but serves as an example to allow scoped definitions in a world where closedness rules apply to non-scoped definitions.

@cueckoo cueckoo added Proposal roadmap/language-changes Specific tag for roadmap issue #339 labels Jul 3, 2021
@cueckoo cueckoo added this to the v0.5.0 language changes milestone Jul 3, 2021
@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @seh in cuelang/cue#543 (comment)

So in the example, pkg#limit is qualified by the tuple ("acme.com/foo", #limit).

Should that be ("example.com/pkg", #limit) instead? Isn't #limit being defined in example.com/pkg?

We reiterate, that the discussion of scoped definitions is missing a lot of details and is intended to be a proposal,

Earlier you said that this is not a proposal, but here you say it is a proposal. It sounds like a proposal.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#543 (comment)

@seh: yes, good catch (I've changed it in the original).

@seh: the scoped/ qualified definitions (pkg#limit) are not part of the proposal. The actual proposal only involves the disallowing new definitions in closed structs discussed in the "Proposal" section.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#543 (comment)

@seh, I grouped the non-proposal part in discussions under "possible future extensions"

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @eonpatapon in cuelang/cue#543 (comment)

Quite agree with this proposal as if the def is closed (ie not including ... and friends) it shouldn't allow the addition of regular and def fields.

But it's not clear to me why ... would allow only regular fields. With ... in the def you can have same the problem you described with regular fields.


As for the discussion part and the described used case:

#E: foo.#A & {
  #limit: <10 // illegal under the new proposal
  a: #limit
  b: #limit
}

What's the advantage of declaring #limit in #E ? It could be declared at the root and used in #E without breaking any rule.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @myitcv in cuelang/cue#543 (comment)

What's the advantage of declaring #limit in #E ? It could be declared at the root and used in #E without breaking any rule.

I think the context @mpvl is looking to establish here is that the author of #E is not the same author/owner as foo.#A, hence you don't have that control/luxury. But beyond that, the concept of #limit is only something that makes sense in the context of #E as presented (else it would already have been defined in foo.#A).

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @myitcv in cuelang/cue#543 (comment)

But it's not clear to me why ... would allow only regular fields. With ... in the def you can have same the problem you described with regular fields.

My take on the proposal is that it does not feel/appear like there is a good rationale/use case that suggests it should apply to definitions, and that therefore this issue has been raised to try and solicit others' intuition/examples to test that hypothesis/position.

Asked another (less convoluted!) way, do you have an example/intuition as to why it should apply to definitions?

How does the package qualified definitions approach work to your mind?

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @eonpatapon in cuelang/cue#543 (comment)

What's the advantage of declaring #limit in #E ? It could be declared at the root and used in #E without breaking any rule.

I think the context @mpvl is looking to establish here is that the author of #E is not the same author/owner as foo.#A, hence you don't have that control/luxury. But beyond that, the concept of #limit is only something that makes sense in the context of #E as presented (else it would already have been defined in foo.#A).

Yes I understood that, so the author of #E could write:

#limit: <10
#E: foo.#A & {
  a: #limit
  b: #limit
}

He doesn't need any control on foo.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @myitcv in cuelang/cue#543 (comment)

@eonpatapon apologies, I misinterpreted what you wrote:

It could be declared at the root

as meaning "could be declared on the root type, i.e. foo.#A". On second reading I can see that's not what you meant.

the author of #E could write:

#limit: <10
#E: foo.#A & {
  a: #limit
  b: #limit
}

Correct, or use hidden definitions (because sometimes the definition being declared within the scope of #E is significant):

#E: foo.#A & {
    _#limit: <10
    a: _#limit
    b: _#limit
}

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @eonpatapon in cuelang/cue#543 (comment)

But it's not clear to me why ... would allow only regular fields. With ... in the def you can have same the problem you described with regular fields.

Asked another (less convoluted!) way, do you have an example/intuition as to why it should apply to definitions?

No not really

How does the package qualified definitions approach work to your mind?

I'm just thinking that the meaning of ... would not be trivial to grasp as it would imply: any regular field, no definition fields but fully qualified definitions fields (if implemented).

My point is if an upstream definition includes ... then any changes to it (adding a regular field or definition field) could potentially brake downstream definitions. So I don't understand why ... would exclude definitions.

If an author writes the definition

#A: {
  a: int
  b: string
}

He is sure that he won't break any downstream usage when modifying #A as it's not possible for user to add any field or definition to #A (as proposed - which I'm ok with)

But if he writes:

#A: {
  a: int
  b: string
  ...
}

He clearly knows there could be some breakage. And excluding definitions in #A downstream doesn't help on that (at least partially).

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#543 (comment)

@eonpatapon

What's the advantage of declaring #limit in #E ?

Sometimes one has to. Maybe a more concrete example would be the following list type:

#ListType: {
  #T: _
  #List: {
    value: #T
    next: #List
  }
}

Users could instantiate ListType by setting #T to create a list type with values of a certain type.
This is a parametrized linked list. Firstly, it needs to be pointed out that #T needs to be defined within ListType: if #T had been defined outside of #ListType, it would not have been possible to instantiate a new #ListType with a more specific type like that.

In general there are all kinds of analogous reasons why a definition has to be scoped within a certain struct.

Now, suppose a users wants to have a list of two values. One could augment the ListType as such:

#ListType2:  {
  #ListType
  #T2
  #List: value2: #T2
}

Still a rather hypothetical example, but you can see perhaps why it is necessary that #T2 be included in #ListType2 and not outside it.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @mpvl in cuelang/cue#543 (comment)

@eonpatapon

"I'm just thinking that the meaning of ... would not be trivial to grasp as it would imply: any regular field, no definition fields but fully qualified definitions fields (if implemented)."

The rationale is based on that it has weird consequences.

Firstly, assume that ...T is implemented. This is already mentioned in the spec. It provides for more consistency between lists and structs and is necessary to model the nuances of JSON Schema in CUE.

Now suppose we would allow ... to open up a struct for definitions. Then I would find it very surprising if ...T did not apply to definitions too. After all, ... is defined as a shorthand of ..._.

If one stops and think about what ...T does, I would find it quite odd if it also applied to definitions. For instance, in the list example one introduces such type most likely to constrain fields within that struct. It does not make sense to have ...T apply to both the definitions and the values in that struct.

Now, even if ...T were to apply to definitions too, given the interplay between [string]: T and ...T, I then would also expect [string]: T to apply to definitions. But that's not possible, because it inherently matches string values, which live in a different namespace. IOW, I would expect [=~"^#"]: T to match regular fields, not definitions. But if ...T can match definitions, I would expect there to be some way to have [string]: T to apply to definitions too.

Anyway, a bit convoluted train of thought, perhaps, but what I'm trying to get at is that allowing ... to apply to definitions results in more warts and questions than not doing so.

All of these complications can be avoided by just not letting ... apply to definitions. The question is whether people see this will create a gap in functionality.

@cueckoo
Copy link
Collaborator Author

cueckoo commented Jul 3, 2021

Original reply by @myitcv in cuelang/cue#543 (comment)

Re-opening because there was a partial revert in https://cue-review.googlesource.com/c/cue/+/8201

@myitcv myitcv added the Discuss Requires maintainer discussion label Jan 13, 2022
@myitcv myitcv added zzz Discuss and removed Discuss Requires maintainer discussion labels Jan 29, 2022
@myitcv myitcv added Discuss Requires maintainer discussion and removed zzz Discuss labels Feb 9, 2022
@myitcv myitcv added the x/user/dagger.io Experimental CUE-user-based labels label Sep 14, 2022
cueckoo pushed a commit that referenced this issue Apr 11, 2023
Open structs or structs with ellipsis should always allow
fields, not just regular ones.
The question is if this is really the case when we disallow
definitions in closed struct by default, but it seems fair
enough to allow this until this is the case as it is consistent
with CUE's behavior.

Fixes #2320
Issue #543

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: I25a42948f569e81ddacc74c7467c98a287861bea
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/552331
Unity-Result: CUEcueckoo <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Reviewed-by: Roger Peppe <[email protected]>
@myitcv myitcv added the zGarden label Jun 13, 2023
@mvdan mvdan removed the zGarden label Feb 8, 2024
cueckoo pushed a commit that referenced this issue Apr 4, 2024
This old evaluator allowed new definitions in closed structs.
This is not according to the spec. However, mimic this
behavior in the new evaluator for now.

Issue #2884
Issue #543

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: Ia0c37fb7c915426f13ed091f948749f7c9ac9fbb
cueckoo pushed a commit that referenced this issue Apr 10, 2024
This old evaluator allowed new definitions in closed structs.
This is not according to the spec. However, mimic this
behavior in the new evaluator for now.

Issue #2884
Issue #543

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: Ia0c37fb7c915426f13ed091f948749f7c9ac9fbb
cueckoo pushed a commit that referenced this issue Apr 10, 2024
This old evaluator allowed new definitions in closed structs.
This is not according to the spec. However, mimic this
behavior in the new evaluator for now.

Issue #2884
Issue #543

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: Ia0c37fb7c915426f13ed091f948749f7c9ac9fbb
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1191565
TryBot-Result: CUEcueckoo <[email protected]>
Unity-Result: CUE porcuepine <[email protected]>
Reviewed-by: Daniel Martí <[email protected]>
cueckoo pushed a commit that referenced this issue Apr 30, 2024
The new evaluator computes closedness quite
differently from the old one. This CL adapts the API to
use the new data.

Note that the data the new evaluator generates is
considerably more convenient. That is why the logic
is quite a bit simpler. The logic could probably be simplified
more.

Also note that HasEllipsis seems to be not computed
correclty entirely. Althoug it seems that many tests
improve if we add the check to isClosed or
IsClosedStruct, there are some cases where this causes
errors. This seems to be the case with one of the tests
in TestAllows. As this only manifests itself when structure
sharing is disabled, we just mark this test as todo and
leave the investigation for later.

Issue #3060
Issue #543
Issue #2884

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: I98e8c25e0680674e2aaa9f743a49b39920d7b266
cueckoo pushed a commit that referenced this issue Apr 30, 2024
The new evaluator computes closedness quite
differently from the old one. This CL adapts the API to
use the new data.

Note that the data the new evaluator generates is
considerably more convenient. That is why the logic
is quite a bit simpler. The logic could probably be simplified
more.

Also note that HasEllipsis seems to be not computed
correclty entirely. Althoug it seems that many tests
improve if we add the check to isClosed or
IsClosedStruct, there are some cases where this causes
errors. This seems to be the case with one of the tests
in TestAllows. As this only manifests itself when structure
sharing is disabled, we just mark this test as todo and
leave the investigation for later.

Issue #3060
Issue #543
Issue #2884

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: I98e8c25e0680674e2aaa9f743a49b39920d7b266
cueckoo pushed a commit that referenced this issue Apr 30, 2024
The new evaluator computes closedness quite
differently from the old one. This CL adapts the API to
use the new data.

Note that the data the new evaluator generates is
considerably more convenient. That is why the logic
is quite a bit simpler. The logic could probably be simplified
more.

Also note that HasEllipsis seems to be not computed
correclty entirely. Although it seems that many tests
improve if we add the check to isClosed or
IsClosedStruct, there are some cases where this causes
errors. This seems to be the case with one of the tests
in TestAllows. As this only manifests itself when structure
sharing is disabled, we just mark this test as todo and
leave the investigation for later.

Issue #3060
Issue #543
Issue #2884

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: I98e8c25e0680674e2aaa9f743a49b39920d7b266
cueckoo pushed a commit that referenced this issue Apr 30, 2024
The new evaluator computes closedness quite
differently from the old one. This CL adapts the API to
use the new data.

Note that the data the new evaluator generates is
considerably more convenient. That is why the logic
is quite a bit simpler. The logic could probably be simplified
more.

Also note that HasEllipsis seems to be not computed
correclty entirely. Although it seems that many tests
improve if we add the check to isClosed or
IsClosedStruct, there are some cases where this causes
errors. This seems to be the case with one of the tests
in TestAllows. As this only manifests itself when structure
sharing is disabled, we just mark this test as todo and
leave the investigation for later.

Issue #3060
Issue #543
Issue #2884

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: I98e8c25e0680674e2aaa9f743a49b39920d7b266
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1194080
Reviewed-by: Daniel Martí <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Unity-Result: CUE porcuepine <[email protected]>
cueckoo pushed a commit that referenced this issue Oct 11, 2024
This was currently not always allowed due to
an inconsistency in the logic.

Issue #543
Fixes #3491

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: If5404d8adbe0ac13a665579e6d273722d54ef268
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1202461
Reviewed-by: Daniel Martí <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Unity-Result: CUE porcuepine <[email protected]>
vanhtuan0409 pushed a commit to anduintransaction/cue that referenced this issue Oct 15, 2024
This was currently not always allowed due to
an inconsistency in the logic.

Issue cue-lang#543
Fixes cue-lang#3491

Signed-off-by: Marcel van Lohuizen <[email protected]>
Change-Id: If5404d8adbe0ac13a665579e6d273722d54ef268
Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1202461
Reviewed-by: Daniel Martí <[email protected]>
TryBot-Result: CUEcueckoo <[email protected]>
Unity-Result: CUE porcuepine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discuss Requires maintainer discussion Proposal roadmap/language-changes Specific tag for roadmap issue #339 x/user/dagger.io Experimental CUE-user-based labels
Projects
None yet
Development

No branches or pull requests

4 participants