Besides files and directories, symbolic links are also an important
entity in the file system. Also git
natively supports symbolic links
as entries in a tree object. Technically, a symbolic link is a string
that can be read via readlink(2)
. However, they can also be followed
and functions to access a file, like open(2)
do so by default. When
following a symbolic link, both, relative and absolute, names can be
used.
Compilers usually follow symlinks for all inputs. Archivers (like
tar(1)
and package-building tools) usually read the link in order to
package the link itself, rather than the file referred to (if any). As a
generic build system, it is desirable to not have to make assumptions on
the intention of the program called (and hence the way it deals with
symlinks). This, however, has the consequence that only symbolic links
themselves can properly model symbolic links.
From a build-system perspective, a root should be self-contained; in
fact, the target-level caching assumes that the git tree identifier
entirely describes a git
-tree root. For this to be true, such a root
has to be both, self contained and independent of its (assumed) location
in the file system. In particular, we can neither allow absolute
symbolic links (as they, depending on the assumed location, might point
out of the root), nor relative symbolic links that go upwards (via a
../
reference) too far.
Like for source roots, we understand action directories as self contained and independent of their location in the file system. Therefore, we have to require the same restrictions there as well, i.e., neither absolute symbolic links nor relative symbolic links going up too far.
Allowing all relative symbolic links that don't point outside the action directory, however, poses an additional layer of complications in the definition of actions: a string might be allowed as symlink in some places in the action directory, but not in others; in particular, we can't tell only from the information that an artifact is a relative symlink whether it can be safely placed at a particular location in an action or not. Similarly for trees for which we only know that they might contain relative symbolic links.
It can be desirable to use system libraries or tools as dependencies. A
typical use case, but not the only one, is packaging a tool for a
distribution. An obvious approach is to declare a system directory as a
root of a repository (providing the needed target files in a separate
root). As it turns out, however, those system directories do contain
symbolic links, e.g., shared libraries pointing to the specific version
(like libfoo.so.3
as a symlink pointing to libfoo.so.3.1.4
) or
detours through /etc/alternatives
.
In order to more easily support building the tool itself against
pre-installed dependencies with the respective directories containing
symbolic links, or tools (like protoc
) being symbolic links (e.g., to
the specific version), repositories can specify, in the "copy"
attribute of the "local_bootstrap"
parameter, a list of files and
directories to be copied as part of the bootstrapping process to a fresh
clean directory serving as root; during this copying, symlinks are
followed.
For cases where we simply have no need for special entries, all the existing
roots have "ignore-special" versions thereof. In such a root
(regardless whether file based, or git
-tree based), everything
not a file or a directory is pretended to be absent. For any
compile-like tasks, the effect of symlinks can be modeled by appropriate
staging.
As certain entries have to be ignored, source trees can only be obtained
by traversing the respective tree; in particular, the TREE
reference
is no longer constant time on those roots, even if git
-tree based.
Nevertheless, for git
-tree roots, the effective tree is a function of
the git
-tree of the root, so git
-tree-based ignore-special roots are
content fixed and hence eligible for target-level caching.
A restricted form of symlinks, more precisely relative non-upwards symbolic links, exist as first-class objects. That is, a new artifact type (besides blobs and trees) for relative non-upwards symbolic links has been introduced. Like any other artifact, they can be freely placed into the inputs of an action, as well as in artifacts, runfiles, or provides map of a target. Artifacts of this new type can be defined as:
- source-symlink reference, as well as implicitly as part of a source tree,
- as a symlink output of an action, as well as implicitly as part of a tree output of an action, and
- explicitly in the rule language from a string through a new
SYMLINK
constructor function.
While existing as separate artifacts in order to properly stage them, (relative non-upwards) symbolic links are, in many aspects, simple files with elevated permissions. As such, they locally use the existing file CAS. Remotely, the existing execution protocol already allows the handling of symbolic links via corresponding Protobuf messages, therefore no extensions are needed.
Additionally, the built-in rules are extended with a "symlink"
target,
allowing the generation of a symlink with given non-upwards target path.
Finally, to be as flexible as possible in handling external repositories with
(possibly) upwards symbolic links, we allow filesystem directories and archives
to be imported also as partially or completely resolved git
-trees.
In a partially resolved tree, all relative upwards symbolic links confined to
the tree get resolved, i.e., replaced by a copy of the entry they point to, if
existing, or removed otherwise. This of course leaves relative non-upwards
symbolic links in the git
-tree, as they are supported objects.
Alternatively, in a completely resolved tree, all relative symbolic links
confined to the tree (whether upwards or not) get resolved, resulting in a
git
-tree free of all symbolic links.
For reasons already described, absolute symbolic links are never supported.
As this process acts directly at the repository level, the resulting roots
remain cacheable and their trees accessible in constant time. Moreover, to
increase the chances of cache hits in just-mr
, not only the resulting
resolved trees are stored, but also the original, unresolved ones.