tooling: native embed and export #2031

myitcv · 2022-10-24T00:23:12Z

Two very common use cases are:

Using cue import -l "$path" to "place" some JSON/Yaml/... at a given path in a CUE configuration.
Using cue export -e "$expr" --out $format to take part of a CUE wider config graph and export it to JSON/Yaml/... ready for consumption by a non-CUE tool/system.

(Here, $path is a shell-like variable syntax used to indicate "some path", similarly for $expr standing in for "some CUE expression").

Note that whilst in some situations both use cases appear in tandem, this is not a requirement. The two use cases are orthogonal.

If use case 1 were a one-off import to CUE, it would be relatively straightforward to run cue import and be done, removing the original JSON/Yaml file. However, in some situations it be necessary to leave the original JSON/Yaml file as the source of truth. For example, if some other process generates/maintains that file. In this scenario, it becomes a burden to have to repeatedly re-run cue import to ensure that the CUE configuration is current with respect to the source JSON/Yaml.

In a similar way, use case 2 is in practice never a one-off command. If CUE is being used to maintain the source of truth for the target tool/system, then cue export will need to be run on every change of the source CUE to ensure that the target tool/system sourcing the output JSON/Yaml reads a current version. The burden in this scenario comes from needing to maintain multiple cue export commands somewhere, either in a bash script/similar, or a cue cmd script. Whilst the cue cmd solution will absolutely work there are two drawbacks:

Writing a cue cmd command is incredibly verbose.
The declaration of intent is a long way from the actual configuration itself (i.e. in a different file).

This issue tracks adding support for a more native embed (use case 1) and export (use case 2) facility.

The word "embed" draws inspiration from Go's embed, and the use of //go:embed directives. It is likely desirable that CUE support a similar directive (comment)-based approach for both use cases 1 and 2.

The text was updated successfully, but these errors were encountered:

myitcv · 2022-10-24T00:27:45Z

Previous brainstorming with @mpvl around use case 1 resulted in the following ideas:

// The use of imports to cover use case 1 appear problematic/
// Including for completeness' sake.
import bar “yaml+jsonschema:./foo/bar.yaml” // Is this a url?
import bar “yaml+jsonschema+fs:./**/bar.yaml” // Is this a url?

// An import builtin also seems "wrong"
bar: import(“./foo/bar.yaml”, encoding: “yaml”, interpretation: “jsonschema”)
bar: import(“./foo/bar.yaml”, type: “yaml+jsonschema”)

// An approach that uses attributes as directives is attractive. Exploration
// of various different forms below. 

bar: _ @embed(foo/bar.toml)

bar: {
      @embed(string, bar/file.json)
      @embed(bytes, bar/file.json)
      @embed(glob, bar/file.json)
      @embed(fs, bar/file.json)

      @embed(bar/file.json)
      @embed(jsonschema: bar/*.json, mapping: fs) // <pathString>: <value>
      @embed(foo/*.yaml, mapping: flat)  // path/to/files: <value
      [string]: #Foo
}

bar: _ @embedfs(foo/*.yaml, jsonschema:bar/*.json)
bar: [string]: #Foo

bar: foo: [Filename=string]: _

Use case 2 could similar be solved via attributes:

xconfig: #Config & {
    @export(json, "xconfig.json")
    
    // ...
}

cedricgc · 2022-10-30T23:05:18Z

I took the time to read about the Go embed package which was enlightening.

One of the main differences for Go and CUE is that the there is a set compile step for programming language (where Go can embed data into a program). With CUE, that timing is not clear, and CUE users may want control of when IO happens.

cedricgc · 2022-10-31T16:31:24Z

Question: One point that is unclear is when the syncing IO happens, since would that not be a violation of CUE hermiticity?

cedricgc · 2022-11-05T20:44:41Z

Question: One point that is unclear is when the syncing IO happens, since would that not be a violation of CUE hermiticity?

Answered by @mpvl:

@cedricgc - I think it is because other files would be parsed as the same way as CUE files themselves
@mpvl - Yes. The set of files must be within the cue.mod purview, just like the cue files, and are thus equally static.

cedricgc · 2022-11-05T21:10:28Z

For the attributes arguments, I think taking protocol schemas would be more flexible

eg @embed(json:///bar/file.json) for embedding a JSON graph (assuming cue.mod is considered the root directory)

I think designing it this way allows for more schemes in the future that are not based on reading from the filesystem but also network/database. For example a scheme to embed data from a remote system using the query language to pick out the 'subgraph`

Referring to cue help filetypes I think this can be compatible with how we treat filetypes on the command line and can also qualify inputs with multiple tags (eg. openapi+yaml:// or json+data://)

myitcv · 2022-11-07T05:28:05Z

For the attributes arguments, I think taking protocol schemas would be more flexible

As discussed offline, we definitely want/need to support specifying the filetype in some way. However, using a URI scheme has (to my mind at least) the unfortunate side effect of suggesting we support (module) absolute paths. Using the Go embed approach as a reference, only relative file paths are supported:

The patterns are interpreted relative to the package directory containing the source file.

kghenderson · 2022-11-13T16:03:36Z

personally, i'm also open to an ultra-simplified variant where you can only embed/import to an unevaluated string
this gets around the order evaluation problem and treats the content as just a value which you can process and validate using regular cue constructs. the above i still consider to be an "import" case as opposed to simple, raw "embed"

these should just load and error like any other value, outside of tools (which separates this use case).
this particular case isn't for transforms or data processing.


PackageDoc: string  @embed _about.md 

TemplateText: string  @embed mytext.tmpl

myitcv · 2023-04-14T18:53:33Z

Adding a further note here: this solution should at least consider the case presented in #2346.

myitcv · 2023-04-14T21:42:32Z

Also noting an exchange with @kghenderson in which he observed there is a parallel between the concept of @embed and the read functions Go's os package, like environment variables. That's not to say the concept of @embed should be abstracted to something more general, just an observation that we might also want/need something similar for those read functions. Because ultimately, @embed is os.ReadFile().

nyarly · 2023-04-14T22:43:23Z

An @embed directive would be extremely welcome. We're definitely using something like the cue import command above to make this work already, and reducing the processing that has to happen would be ideal.

One use case we have, though, involves extracting data from CRDs - current we use yq to extract the schema part, and separately pull out GVK stuff and then integrate them. We might be able to get something useful if @embed took a subresource path? e.g.

let schema = @embed(jsonschema+yaml, "./crds/stringsecrets.yaml", "spec.versions.v1alpha1.schema")

or something?

Maybe what's called for is an extra tag - @interpret or something?

let crds = @embed(yaml, "./crds/*.yaml")

for path, crd in crds {
  for version, schema in crd.spec.versions {
    (version): (crd.names.kind): @interpret(jsonschema, schema.openAPIV3Schema)
     (version): (crd.names.kind): {
      apiVersion: "\(crd.spec.group)/\(version)"
      kind: "\(crd.names.kind)"
    }
  }
}

Alternatively, maybe it wants to be encoding/jsonschema -> jsonschema.Parse(...) or something?

The critical thing, I think, is that I have (in this case) JSONSchema embedded in YAML, so to work with it in CUE, I'd want to @embed the YAML to get CUE, extract the JSONSchema, and then process it again to get (different) CUE.

myitcv · 2023-04-18T16:38:47Z

Thanks for the use case, @nyarly. I think that fits with a later iteration of @embed and @export.

I'm tentatively marking an initial version of this proposal as v0.7.0, pending working on a design draft with @mpvl and @rogpeppe this week at KubeCon EU.

The initial version would be bytes only for embed and export. That would mean (@nyarly and others) that any transformation inbound/outbound would need to happen via other fields or let declarations.

The main goal of the design draft for the initial version is:

Get something out in a timely fashion, goal of v0.7.0 for implementation
Not back ourselves into a corner with respect to future enhancements

The second point is key. We should not preclude the kind of syntactic sugar that @nyarly imagines above, and that we have speculated about in the original notes in this discussion. Because ultimately it should, for example, be possible and easy for someone to say "embed this file at this point treating its contents as JSON".

myitcv · 2023-04-21T15:44:53Z

Adding an observation here related to cue import --recursive, and issues like #1209. Recursive import is imprecise because there are heuristics that guess whether a string field is YAML, for example. To make such imports precise we could follow a similar approach to the text proto adapter where a schema helps to make precise what is being imported. i.e. at various paths, types etc are specified. This would be similar to cue vet -d. Noting this observation here in case there is any overlap with the embed/export discussion here.

myitcv · 2023-06-14T13:47:15Z

Milestone moved to v0.7.x, in order that we focus on performance and disjunction related changes first in the v0.7 series, and that get to this whilst people are trying out early alphas of v0.7.0.

phoban01 · 2024-04-10T07:40:21Z

Any further progress to share on this?

myitcv · 2024-05-22T15:44:49Z

@mpvl is currently entirely focussed on performance work as part of the evaluator rewrite. See #2850 for the ~fortnightly updates on progress on that front. When work on that front settles we are going to publish a design doc/proposal on how native embed will work with CUE, along with an experimental implementation.

Exposing ParseFileAndType. For the embedding proposal, file and type are specfied separately and do not need to be parsed as on the command line. Issue #2031 Signed-off-by: Marcel van Lohuizen <[email protected]> Change-Id: Ib02f845d503edf1d78834a1ff2a0c224cc936748 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1196716 TryBot-Result: CUEcueckoo <[email protected]> Reviewed-by: Daniel Martí <[email protected]> Unity-Result: CUE porcuepine <[email protected]>

@extern

Right now, we only allow one type of extern attribute in a file. At the file level, we define @extern(kind). Fields within the file can then be associated with an @extern attribute that is interpreted as defined by kind. This approach may work for lower-level functionality like support for WASM, but it seems a bit unintuitive for embed. Instead, we suggest that after a file-level @extern(kind) declaration the field attributes take the form @kind(). This is what is implemented here. This has the additional benefit that we could more easily allow different types of extern fields within a single file. Note that the original reason to reuse @extern for field attributes was to avoid a proliferation of attributes. This namespace encrouching is still a bit mitigated by the @extern(kind) attribute. In the future we can find a different mechanism to define attributes scoped by domain. Issue #2031 Signed-off-by: Marcel van Lohuizen <[email protected]> Change-Id: I28b1fdd0f0a85c46a544f71bbff40a7772e60873 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1196717 Reviewed-by: Daniel Martí <[email protected]> Unity-Result: CUE porcuepine <[email protected]> TryBot-Result: CUEcueckoo <[email protected]>

This is a first-stab and partial implementation of the embedding proposal. See the TODO list included in embed.go to see what is outstanding. Issue #2031 This issue is not closed, as it also referes to the complementary export attribute. Signed-off-by: Marcel van Lohuizen <[email protected]> Change-Id: Ic296a28aa009509f9a17913c7e5a0794de5a7a35 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1196718 Unity-Result: CUE porcuepine <[email protected]> Reviewed-by: Aram Hăvărneanu <[email protected]> Reviewed-by: Daniel Martí <[email protected]> TryBot-Result: CUEcueckoo <[email protected]> Reviewed-by: Roger Peppe <[email protected]>

This could later be allowed as an option. This is a security feature. We also disallow hidden files on Windows for the same reason, which is a bit more involved. Note that this results in potentially slightly different behavior under Windows and Unix. This is already the case. For instance, the set of valid filenames is different on the different supported OSes. So we accept this discrepancy in favor of added security. Verified that without adding the new logic, the hidden file that was added to embed.txtar gets included in the output. Issue #2031 Signed-off-by: Marcel van Lohuizen <[email protected]> Change-Id: Iacff803f4c388d1f2792665ed5adb32f68f00ffa Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1196775 TryBot-Result: CUEcueckoo <[email protected]> Reviewed-by: Roger Peppe <[email protected]> Unity-Result: CUE porcuepine <[email protected]> Reviewed-by: Daniel Martí <[email protected]>

myitcv · 2024-07-12T10:22:59Z

As part of the v0.10.0-alpha.1 release, we have just published an embed proposal #3264. Please give it a try out. We would very much welcome feedback via the linked discussion.

myitcv · 2024-09-05T11:48:14Z

Adding a drive-by comment regarding the hypothesised @export. The shape of such a thing is becoming clearer thanks to the @embed experiment. But also as a result of playing with https://github.com/cue-lang/cuelang.org/blob/master/internal/cmd/writefs/main.go.

Indeed the writefs experiment has flagged one important thing that @export would need to support: writing "code generated by" headers to files, if specified via an option.

myitcv added the FeatureRequest New feature or request label Oct 24, 2022

myitcv added the Discuss Requires maintainer discussion label Feb 8, 2023

myitcv added the Re-milestone label Apr 13, 2023

myitcv added this to the v0.7.0: performance and disjunction fixes milestone Apr 18, 2023

myitcv removed Discuss Requires maintainer discussion Re-milestone labels Apr 18, 2023

myitcv added zGarden and removed zGarden labels Jun 13, 2023

myitcv modified the milestones: v0.7.0: performance and disjunction fixes, v0.7.x Jun 14, 2023

myitcv mentioned this issue May 22, 2024

cue/load: special '_' package name incorrectly allowed in import path #3167

Closed

myitcv mentioned this issue Jul 5, 2024

allow import declaration to load jsonschema #438

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tooling: native embed and export #2031

tooling: native embed and export #2031

myitcv commented Oct 24, 2022 •

edited

Loading

myitcv commented Oct 24, 2022

cedricgc commented Oct 30, 2022

cedricgc commented Oct 31, 2022

cedricgc commented Nov 5, 2022

cedricgc commented Nov 5, 2022 •

edited

Loading

myitcv commented Nov 7, 2022

kghenderson commented Nov 13, 2022 •

edited

Loading

myitcv commented Apr 14, 2023

myitcv commented Apr 14, 2023

nyarly commented Apr 14, 2023

myitcv commented Apr 18, 2023

myitcv commented Apr 21, 2023

myitcv commented Jun 14, 2023

phoban01 commented Apr 10, 2024

myitcv commented May 22, 2024

myitcv commented Jul 12, 2024

myitcv commented Sep 5, 2024

tooling: native embed and export #2031

tooling: native embed and export #2031

Comments

myitcv commented Oct 24, 2022 • edited Loading

myitcv commented Oct 24, 2022

cedricgc commented Oct 30, 2022

cedricgc commented Oct 31, 2022

cedricgc commented Nov 5, 2022

cedricgc commented Nov 5, 2022 • edited Loading

myitcv commented Nov 7, 2022

kghenderson commented Nov 13, 2022 • edited Loading

myitcv commented Apr 14, 2023

myitcv commented Apr 14, 2023

nyarly commented Apr 14, 2023

myitcv commented Apr 18, 2023

myitcv commented Apr 21, 2023

myitcv commented Jun 14, 2023

phoban01 commented Apr 10, 2024

myitcv commented May 22, 2024

myitcv commented Jul 12, 2024

myitcv commented Sep 5, 2024

myitcv commented Oct 24, 2022 •

edited

Loading

cedricgc commented Nov 5, 2022 •

edited

Loading

kghenderson commented Nov 13, 2022 •

edited

Loading