all: to what extent should CUE support symlinks? #3300

myitcv · 2024-07-17T15:49:00Z

"Inspired" by #3299, I don't think we have been explicit about the extent to which modules support (or otherwise) symbolic links.

Questions not limited to:

Should module loading follow symlinks at all? We already document that published modules don't support them: https://cuelang.org/docs/reference/modules/#zip-path-size-constraints
Could/should we distinguish between the main module or otherwise?
...

This issue might generate further issues for bugs/implementation.

End goal is that we should be clear in our documentation (exact location TBC) on the extent to which module loading does/doesn't support symbolic links.

See also #3298

seh · 2024-07-17T17:48:32Z

In my case—the basis for the aforementioned #3299—I'm not creating these symbolic links deliberately in my primary source repository. Rather, I'm using my seh/rules_cue Bazel ruleset to evaluate the CUE files within Bazel's actions, and Bazel's normal sandbox preparation behavior involves creating a directory tree full of symbolic links to regular files sitting elsewhere, as an optimization against copying all of those files.

myitcv · 2024-07-18T04:39:56Z

@seh - thanks for adding this context, extremely useful.

rogpeppe · 2024-07-18T12:55:41Z

As a data point, I checked the go command's behaviour in this regard:

Go accepts and reads symlinks locally, including when using a directory replace directive and in vendor directories.
Go silently drops symlinks when consuming from a network source, such as the Go proxy or a remote VCS source

seh · 2024-07-18T13:03:36Z

Regarding support for Bazel's sandboxes, it wouldn't be much trouble to require that callers of the cue tool pass a command-line flag to enable tolerance for symbolic links, if you didn't feel that it was safe or fair to use that as the default behavior. If there was such a command-line flag, I'd have my Bazel ruleset pass it to the cue tool when invoking it inside of Bazel actions.

mvdan · 2024-07-19T12:19:39Z

Should module loading follow symlinks at all?

Perhaps I jumped the gun by approving the patch for #3298, which makes the loader follow symlinks locally. The way that issue was raised before this one, in the format of a bug report, didn't make it clear that it was a slightly open question.

Given that Go does this locally, doing the same seems fine to me. But given that symlinks cannot be published or consumed via a GOPROXY (nor a CUE registry), I think it would also be fine to leave them as unsupported in a consistent way.

mvdan · 2024-07-19T12:22:59Z

Making a decision here should also close #1672, hopefully.

Note that some people in that thread were wondering about symlink support for loading local packages, but it's not clear to me that any of them actually relied on it in the past.

myitcv · 2024-07-19T12:26:48Z

But given that symlinks cannot be published or consumed via a GOPROXY (nor a CUE registry), I think it would also be fine to leave them as unsupported in a consistent way.

With the caveat that we would then need to understand/answer how things could/should work with Bazel. @seh how familiar are you with what Go does with respect to Bazel? Anything we should be learning from there?

seh · 2024-07-19T13:07:02Z

Note that some people in that thread were wondering about symlink support for loading local packages, but it's not clear to me that any of them actually relied on it in the past.

I've been relying on it since first using CUE with Bazel three and a half years ago.

seh · 2024-07-19T13:11:49Z

how familiar are you with what Go does with respect to Bazel? Anything we should be learning from there?

As far as I can tell from inspecting the directory trees created by the rules_go ruleset when I build Go programs with Bazel, every file presented to the Go toolchain is a symbolic link.

If we happen to have Bazel generate any files used as input to compiling or linking action, that artifact might wind up as a regular file sitting within that same sandbox's directory tree, but not necessarily.

myitcv · 2024-07-22T04:56:54Z

My sense is that we keep things working as-is, by first tightening up what we mean by "as-is", and adding any missing tests.

So here is an attempt to define what "as-is" is/should be:

CUE does drops symlinks when writing module zip files (per https://cuelang.org/docs/reference/modules/#zip-path-size-constraints) and when reading them.

That second part of the sentence is worth us making sure with a test. Because it's possible to construct a "rogue" zip file that does include symlinks using the regular Unix zip command. Being really sure how CUE treats such a zip is important.

Note that the rule would also apply in the case of:

cue export $pkg@$version

As such, an embed "read" will only ever happen within the CUE module cache for a zip-based dependency.

That way we can say that if the CUE code did not come from a zip archive (working with a main module on disk, a directory-based replace directive, vendor, cue.mod/{pkg,gen,usr}, ...) then symlinks will work. This should ensure we leave the existing support for Bazel in place, and indeed make it more official via that definition.

@embed directives will never follow symlinks.

mvdan · 2024-07-22T09:59:02Z

by first tightening up what we mean by "as-is", and adding any missing tests.

When we have those tests, we should also verify how older CUE versions behaved, for example v0.5.0.

myitcv · 2024-07-23T09:04:19Z

Following my comment in #3299 (comment), I have updated my comment above to state clearly that:

@embed directives will never follow symlinks.

This brings CUE inline with Go's behaviour with respect to embed.

@seh - please could I ask you to take a look at my conclusion above and raise any concerns you might have from a Bazel perspective?

seh · 2024-07-23T12:07:07Z

please could I ask you to take a look at my conclusion above and raise any concerns you might have from a Bazel perspective?

The treatment of files in modules proposed above sounds like it would work fine with Bazel's sandbox technique, at least for files in modules that come from the same Bazel workspace. That would cover my use of modules thus far.

The restriction on embed may turn out to be onerous, in that one could write a CUE file that embeds a JSON file sitting right next to it on disk, but when Bazel prepares a sandbox for an action that involves these two files as input, it will create symbolic links to both. When the cue tool reads the CUE file—traversing the symbolic link to do so—it will find an @embed attribute mentioning a file that's present on disk as another symbolic link. If the cue tool then refuses to read that file, it appears to be behaving differently within the sandbox. Of course, the cue tool would have refused to read the symbolic link outside of the sandbox too, but that's not what the author intended to arrange.

I can imagine a softer but more complicated rule, something like: If we read the CUE file containing the @embed attribute through a symbolic link, then we'll look for the file mentioned in the @embed attribute via relative path from the target of the CUE file's symbolic link. In other words, for relative paths, the resolution base would be the CUE file's actual path, meaning the final target of any symbolic links that pointed to it.

This adds explicit tests for the behavior of CUE when it encounters symbolic links. Namely: - symbolic links are followed for files that are directly part of the build. - symbolic links are disallowed for embedded files. - symbolic links are silently dropped when publishing modules. Modulo the fact that modules did not exist previously, this matches previous behavior (checked against v0.4.3): the non-module parts of `symlink.txtar` test pass against v0.4.3. For #3300. Signed-off-by: Roger Peppe <[email protected]> Change-Id: Iaf6c4668101f3e91d78a8cab5734efea0a2a9624 Reviewed-on: https://review.gerrithub.io/c/cue-lang/cue/+/1198252 TryBot-Result: CUEcueckoo <[email protected]> Reviewed-by: Daniel Martí <[email protected]> Unity-Result: CUE porcuepine <[email protected]>

rogpeppe · 2024-07-23T17:16:35Z

@seh

The restriction on embed may turn out to be onerous, in that one could write a CUE file that embeds a JSON file sitting right next to it on disk, but when Bazel prepares a sandbox for an action that involves these two files as input, it will create symbolic links to both.

Given that Go seems to have exactly the same rule, do you know how Go manages to avoid that difficulty, by any chance?

seh · 2024-07-23T17:35:58Z

do you know how Go manages to avoid that difficulty, by any chance?

In the rules_go Bazel ruleset, both the go_library and go_binary rules accept an "embedsrcs" attribute to designate the input files that are candidates for being embedded. The rules restrict the nominated dependencies to sit within the same Bazel package's root directory, which is a logical restriction; it's possible to mix both primary source files and files generated by other Bazel targets, so long as the targets produce output files that sit in that same directory.

rules_go prepares a so-called embedcfg file that it then supplies to the Go compiler via the -embedcfg command-line flag. You can see how the compiler uses that file in the readEmbedCfg function.

There's a lot of bundling up the files to be embedded and figuring out their relative paths. It's hard to follow what's going on, but I think it's mainly about working around the restriction on following symbolic links.

mvdan · 2024-07-24T08:46:43Z

With the tests from https://review.gerrithub.io/c/cue-lang/cue/+/1198252 merged, I'm pushing the rest of the work here to alpha.3, as we are releasing alpha.2 today.

rogpeppe · 2024-07-31T10:38:38Z

@seh

do you know how Go manages to avoid that difficulty, by any chance?

In the rules_go Bazel ruleset, both the go_library and go_binary rules accept an "embedsrcs" attribute to designate the input files that are candidates for being embedded. The rules restrict the nominated dependencies to sit within the same Bazel package's root directory, which is a logical restriction; it's possible to mix both primary source files and files generated by other Bazel targets, so long as the targets produce output files that sit in that same directory.

rules_go prepares a so-called embedcfg file that it then supplies to the Go compiler via the -embedcfg command-line flag. You can see how the compiler uses that file in the readEmbedCfg function.

There's a lot of bundling up the files to be embedded and figuring out their relative paths. It's hard to follow what's going on, but I think it's mainly about working around the restriction on following symbolic links.

That's all super helpful, thanks! I'm totally unfamiliar with Bazel so it's hard for me to know how much of this applies only to Go and how much might apply to CUE too, but it seems like we will probably need a similar mechanism to Go's -embedcfg so we can make embed directives work even when the embed targets are in another directory. Or... does Bazel know enough to be able to make symlinks to the embedded files too (and hence just supporting embedded files that are symlinks might be OK) ?

seh · 2024-08-04T17:29:53Z

does Bazel know enough to be able to make symlinks to the embedded files too (and hence just supporting embedded files that are symlinks might be OK) ?

Bazel will create symbolic links to all the files nominated as "inputs" for an action that it's going to run, regardless of where the source files sit relative to each other either in the primary repository or as generated outputs from other actions. So long as we mention that the embedded files need to be inputs to the action that runs cue def or cue export, we'll get symbolic links to them in the sandbox that Bazel creates for the action.

The reason that rules_go wants the embedded source files specified in a separate attribute on the go_library or go_binary rule—that is, "embedsrcs" alongside the usual "srcs"—is so that it can arrange for telling compiler about them differently, assuaging its concerns about reading symbolic links.

Introduce version 0.10.0, establishing it as the new default, removing the prior versions 0.9.0, 0.9.1, and 0.9.2, as none of them could accommodate symbolic links within imported modules that most of Bazel's sandboxing strategies require. Pertinent URLs to related discussion: - https://cuelang.slack.com/archives/C063DSMKS56/p1720620420965109 - cue-lang/cue#3300

myitcv · 2024-10-10T08:52:37Z

I've retitled this issue to the much broader "to what extent should CUE support symlinks?"

With a view to this issue being closed with a holistic explanation of where CUE does/doesn't support symlinks, along with brief rationale for each relevant point.

@rogpeppe is going to follow up with that summary.

We will also create further issues as needs be to capture outstanding bugs/work. e.g. if our aim is to support symlinks in such a way that things work for Bazel, we should have tests that lock that in.

rogpeppe · 2024-10-10T13:07:10Z

@seh Do I have it right that you're happy with CUE's current behaviour with respect to symbolic links?

seh · 2024-10-12T14:32:00Z

Do I have it right that you're happy with CUE's current behaviour with respect to symbolic links?

From what I've read, it sounds workable for use with Bazel. I'd like to try an example involving a mixture of static source and generated files that I can have Bazel run through cue export.

rogpeppe · 2024-10-22T14:10:44Z

For the record, the executive summary of current (and proposed future) behaviour with respect to symlinks is as follows:

We allow and respect symbolic links except that we reject/ignore them whenever they become the contents of a module in a registry.

As such, symbolic links will not be unarchived from the contents of a module archive and will not appear in the module cache.
As another consequence of this behaviour, when we implement directory replace for modules (#2956), symbolic links will function inside the directory holding the replacement module.
Note: this behaviour covers all files within a module, including files embedded with the @embed directive.

myitcv · 2024-10-22T14:15:22Z

Great. Thanks, @rogpeppe. This I think concludes the discussion. We follow up on a possible FAQ for detailed points implied by this conclusion in cue-lang/docs-and-content#187. Publishing such an FAQ to cuelang.org will allow for the more easy surfacing of answers/results. FYI @jpluscplusm

myitcv added the modules Issues related to CUE modules and the experimental implementation label Jul 17, 2024

github-project-automation bot added this to Modules Roadmap Jul 17, 2024

github-project-automation bot moved this to Backlog in Modules Roadmap Jul 17, 2024

myitcv added this to Release v0.10 Jul 17, 2024

github-project-automation bot moved this to Backlog in Release v0.10 Jul 17, 2024

myitcv moved this from Backlog to v0.10.0-alpha.2 in Release v0.10 Jul 17, 2024

rogpeppe mentioned this issue Jul 18, 2024

mod/modzip: symbolic links are silently omitted #3304

Closed

myitcv mentioned this issue Jul 23, 2024

cue/interpreter/embed: embed should not follow symbolic links #3299

Closed

mvdan moved this from v0.10.0-alpha.2 to v0.10.0-alpha.3 in Release v0.10 Jul 24, 2024

mvdan assigned rogpeppe Jul 30, 2024

rogpeppe moved this from v0.10.0-alpha.3 to v0.10.0-rc.1 in Release v0.10 Jul 31, 2024

mvdan mentioned this issue Aug 6, 2024

cue/load: Loader does not ignore irregular files #1672

Closed

mvdan moved this from v0.10.0-rc.1 to v0.10.0 in Release v0.10 Aug 7, 2024

mvdan added this to Release v0.11 Aug 13, 2024

mvdan removed this from Release v0.10 Aug 13, 2024

github-project-automation bot moved this to Backlog in Release v0.11 Aug 13, 2024

mvdan moved this from Backlog to v0.11.0-alpha.1 in Release v0.11 Aug 13, 2024

myitcv changed the title ~~modules: to what extent should modules support symlinks?~~ all: to what extent should CUE support symlinks? Oct 10, 2024

myitcv mentioned this issue Oct 22, 2024

FAQ regarding symlink support in CUE cue-lang/docs-and-content#187

Open

myitcv closed this as completed Oct 22, 2024

github-project-automation bot moved this from Backlog to Done in Modules Roadmap Oct 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

all: to what extent should CUE support symlinks? #3300

all: to what extent should CUE support symlinks? #3300

myitcv commented Jul 17, 2024 •

edited

Loading

seh commented Jul 17, 2024

myitcv commented Jul 18, 2024

rogpeppe commented Jul 18, 2024 •

edited

Loading

seh commented Jul 18, 2024

mvdan commented Jul 19, 2024

mvdan commented Jul 19, 2024

myitcv commented Jul 19, 2024

seh commented Jul 19, 2024

seh commented Jul 19, 2024

myitcv commented Jul 22, 2024 •

edited

Loading

mvdan commented Jul 22, 2024

myitcv commented Jul 23, 2024

seh commented Jul 23, 2024

rogpeppe commented Jul 23, 2024

seh commented Jul 23, 2024

mvdan commented Jul 24, 2024

rogpeppe commented Jul 31, 2024

seh commented Aug 4, 2024

myitcv commented Oct 10, 2024

rogpeppe commented Oct 10, 2024

seh commented Oct 12, 2024

rogpeppe commented Oct 22, 2024

myitcv commented Oct 22, 2024

all: to what extent should CUE support symlinks? #3300

all: to what extent should CUE support symlinks? #3300

Comments

myitcv commented Jul 17, 2024 • edited Loading

seh commented Jul 17, 2024

myitcv commented Jul 18, 2024

rogpeppe commented Jul 18, 2024 • edited Loading

seh commented Jul 18, 2024

mvdan commented Jul 19, 2024

mvdan commented Jul 19, 2024

myitcv commented Jul 19, 2024

seh commented Jul 19, 2024

seh commented Jul 19, 2024

myitcv commented Jul 22, 2024 • edited Loading

mvdan commented Jul 22, 2024

myitcv commented Jul 23, 2024

seh commented Jul 23, 2024

rogpeppe commented Jul 23, 2024

seh commented Jul 23, 2024

mvdan commented Jul 24, 2024

rogpeppe commented Jul 31, 2024

seh commented Aug 4, 2024

myitcv commented Oct 10, 2024

rogpeppe commented Oct 10, 2024

seh commented Oct 12, 2024

rogpeppe commented Oct 22, 2024

myitcv commented Oct 22, 2024

myitcv commented Jul 17, 2024 •

edited

Loading

rogpeppe commented Jul 18, 2024 •

edited

Loading

myitcv commented Jul 22, 2024 •

edited

Loading