Skip to content

Commit

Permalink
urls and sha256 attributes for llvm archive download
Browse files Browse the repository at this point in the history
Current strategy for detecting OS name and matching it to an LLVM
distribution archive has many corner cases. Moreover, people might have
their own URLs where they host these archives.

Having an explicit dict of URLs and sha256, keyed by OS name, version
and arch, will help people get over these corner cases, and be able to
try new LLVM releases without waiting for an update to this repository.
Note that the keys here are different than the `toolchain_roots`
attribute.

While this method does have an extra setup step for each new OS type
that the user's workspace needs to support, this approach is more
flexible.

If we notice that people are using this feature more than the
auto inferred URLs, or that the llvm_release_names.py script is out of
date, we may just retain this feature, and delete the other way of
getting archives.

Additionally, fixes #125. People with that use case can now use the
`urls` attribute, or use the new convenience aliases.
  • Loading branch information
Siddhartha Bagaria committed Mar 7, 2022
1 parent dd9e6a6 commit c728048
Show file tree
Hide file tree
Showing 10 changed files with 301 additions and 104 deletions.
64 changes: 47 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,8 @@ build --incompatible_enable_cc_toolchain_resolution
## Basic Usage

The toolchain can automatically detect your OS and arch type, and use the right
pre-built binary distribution from llvm.org. The detection is currently
based on host OS and is not perfect, so some distributions, docker based
sandboxed builds, and remote execution builds will need toolchains configured
manually through the `distribution` attribute. We expect the detection logic to
grow through community contributions. We welcome PRs! :smile:
pre-built binary LLVM distribution. See the section on "Bring Your Own LLVM"
below for more options.

See in-code documentation in [rules.bzl](toolchain/rules.bzl) for available
attributes to `llvm_toolchain`.
Expand Down Expand Up @@ -132,12 +129,34 @@ and instead rely on the `--incompatible_enable_cc_toolchain_resolution` flag.

#### Bring Your Own LLVM

The LLVM toolchain archive is downloaded and extracted as a separate repository
with the suffix `_llvm`. Alternatively, you can also specify your own
repositories for each host os-arch pair through the `toolchain_roots`
attribute. Each of these repositories is typically configured through
`local_repository` or `http_archive` (with `build_file` attribute as
`@com_grail_bazel_toolchain//toolchain:BUILD.llvm_repo`).
The following mechanisms are available for using an LLVM toolchain:

1. Host OS information is used to find the right pre-built binary distribution
from llvm.org, given the `llvm_version` attribute. The LLVM toolchain
archive is downloaded and extracted as a separate repository with the suffix
`_llvm`. The detection is not perfect, so you may have to use other options
for some host OS type and versions. We expect the detection logic to grow
through community contributions. We welcome PRs.
2. You can use the `urls` attribute to specify your own URLs for each OS type,
version and architecture. For example, you can specify a different URL for
Arch Linux and a different one for Ubuntu. Just as with the option above,
the archive is downloaded and extracted as a separate repository with the
suffix `_llvm`.
3. You can also specify your own bazel package paths or local absolute paths
for each host os-arch pair through the `toolchain_roots` attribute. Note
that the keys here are different and less granular than the keys in the `urls`
attribute. When using a bazel package path, each of the values is typically
a package in the user's workspace or configured through `local_repository` or
`http_archive`; the BUILD file of the package should be similar to
`@com_grail_bazel_toolchain//toolchain:BUILD.llvm_repo`. If using only
`http_archive`, maybe consider using the `urls` attribute instead to get more
flexibility if you need.
4. All the above options rely on host OS information, and are not suited for
docker based sandboxed builds or remote execution builds. Such builds will
need a single distribution version specified through the `distribution`
attribute, or URLs specified through the `urls` attribute with an empty key, or
a toolchain root specified through the `toolchain_roots` attribute with an
empty key.

#### Sysroots

Expand Down Expand Up @@ -191,12 +210,23 @@ The toolchain is tested to work with `rules_go`, `rules_rust`, and

The LLVM distribution also provides several tools like `clang-format`. You can
depend on these tools directly in the bin directory of the distribution. When
using the auto-configured download (not using `toolchain_roots`), the
distribution is available in the repo with the suffix `_llvm` appended to the
name you used for the `llvm_toolchain` rule. For example,
`@llvm_toolchain_llvm//:bin/clang-format` is a valid and visible target in the
quickstart example above.

not using the `toolchain_roots` attribute, the distribution is available in the
repo with the suffix `_llvm` appended to the name you used for the
`llvm_toolchain` rule. For example, `@llvm_toolchain_llvm//:bin/clang-format`
is a valid and visible target in the quickstart example above.

When using the `toolchain_roots` attribute, there is currently no single target
that you can reference, and you may have to alias the tools you want with a
`select` clause in your workspace.

As a convenience, some targets are aliased appropriately in the configuration
repo (as opposed to the LLVM distribution repo) for you to use and will work
even when using `toolchain_roots`. If your repo is named `llvm_toolchain`, then
they can be referenced as:

- `@llvm_toolchain//:omp`
- `@llvm_toolchain//:clang-format`
- `@llvm_toolchain//:llvm-cov`

## Prior Art

Expand Down
4 changes: 4 additions & 0 deletions WORKSPACE
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ load("@com_grail_bazel_toolchain//toolchain:rules.bzl", "llvm_toolchain")
llvm_toolchain(
name = "llvm_toolchain",
llvm_version = "12.0.0",
# To test (manually) if the URLs feature works to fetch an archive.
sha256 = {"ubuntu-20.04-x86_64": "a9ff205eb0b73ca7c86afc6432eed1c2d49133bd0d49e47b15be59bbf0dd292e"},
strip_prefix = {"ubuntu-20.04-x86_64": "clang+llvm-12.0.0-x86_64-linux-gnu-ubuntu-20.04"},
urls = {"ubuntu-20.04-x86_64": ["https://github.com/llvm/llvm-project/releases/download/llvmorg-12.0.0/clang+llvm-12.0.0-x86_64-linux-gnu-ubuntu-20.04.tar.xz"]},
)

load("@llvm_toolchain//:toolchains.bzl", "llvm_register_toolchains")
Expand Down
10 changes: 10 additions & 0 deletions toolchain/BUILD.toolchain.tpl
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,14 @@ cc_import(
shared_library = "%{llvm_repo_label_prefix}lib/libomp.%{host_dl_ext}",
)

alias(
name = "clang-format",
actual = "%{llvm_repo_label_prefix}bin/clang-format",
)

alias(
name = "llvm-cov",
actual = "%{llvm_repo_label_prefix}bin/llvm-cov",
)

%{cc_toolchains}
80 changes: 53 additions & 27 deletions toolchain/internal/common.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,59 @@ def arch(rctx):
fail("Failed to detect machine architecture: \n%s\n%s" % (exec_result.stdout, exec_result.stderr))
return exec_result.stdout.strip()

def os_arch_pair(os, arch):
return "{}-{}".format(os, arch)

_supported_os_arch = [os_arch_pair(os, arch) for (os, arch) in SUPPORTED_TARGETS]

def supported_os_arch_keys():
return _supported_os_arch

def check_os_arch_keys(keys):
for k in keys:
if k and k not in _supported_os_arch:
fail("Unsupported {{os}}-{{arch}} key: {key}; valid keys are: {keys}".format(
key = k,
keys = ", ".join(_supported_os_arch),
))

def canonical_dir_path(path):
if not path.endswith("/"):
return path + "/"
return path

def pkg_path_from_label(label):
if label.workspace_root:
return label.workspace_root + "/" + label.package
else:
return label.package

def attr_dict(attr):
# Returns a mutable dict of attr values from the struct. This is useful to
# return updated attribute values as return values of repository_rule
# implementations.

tuples = []
types = []
for key in dir(attr):
if not hasattr(attr, key):
fail("key %s not found in attributes" % key)
val = getattr(attr, key)

# Make mutable copies of frozen types.
typ = type(val)
if typ == "dict":
val = dict(val)
elif typ == "list":
val = list(val)
elif typ == "builtin_function_or_method":
# Functions can not be compared.
continue

tuples.append((key, val))

return dict(tuples)

# Tries to figure out if a tool supports newline separated arg files (i.e.
# `@file`).
def _tool_supports_arg_file(rctx, tool_path):
Expand Down Expand Up @@ -157,30 +210,3 @@ host_tools = struct(
tool_supports = _check_host_tool_supports,
get_and_assert = _get_host_tool_and_assert_supports,
)

def os_arch_pair(os, arch):
return "{}-{}".format(os, arch)

_supported_os_arch = [os_arch_pair(os, arch) for (os, arch) in SUPPORTED_TARGETS]

def supported_os_arch_keys():
return _supported_os_arch

def check_os_arch_keys(keys):
for k in keys:
if k and k not in _supported_os_arch:
fail("Unsupported {{os}}-{{arch}} key: {key}; valid keys are: {keys}".format(
key = k,
keys = ", ".join(_supported_os_arch),
))

def canonical_dir_path(path):
if not path.endswith("/"):
return path + "/"
return path

def pkg_path_from_label(label):
if label.workspace_root:
return label.workspace_root + "/" + label.package
else:
return label.package
82 changes: 64 additions & 18 deletions toolchain/internal/llvm_distributions.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# limitations under the License.

load("@bazel_tools//tools/build_defs/repo:utils.bzl", "read_netrc", "use_netrc")
load("//toolchain/internal:common.bzl", _python = "python")
load("//toolchain/internal:common.bzl", _attr_dict = "attr_dict", _python = "python")

# If a new LLVM version is missing from this list, please add the shasum here
# and send a PR on github. To compute the shasum block, you can use the script
Expand Down Expand Up @@ -223,20 +223,44 @@ def _get_auth(ctx, urls):

return {}

def download_llvm_preconfigured(rctx):
def download_llvm(rctx):
urls = []
if rctx.attr.urls:
urls, sha256, strip_prefix, key = _urls(rctx)
if not urls:
urls, sha256, strip_prefix = _distribution_urls(rctx)

res = rctx.download_and_extract(
urls,
sha256 = sha256,
stripPrefix = strip_prefix,
auth = _get_auth(rctx, urls),
)

updated_attrs = _attr_dict(rctx.attr)
if not sha256 and key:
# Only using the urls attribute can result in no sha256.
# Report back the sha256 if the URL came from a non-empty key.
updated_attrs["sha256"].update([(key, res.sha256)])

return updated_attrs

def _urls(rctx):
key = _host_os_key(rctx)

urls = rctx.attr.urls.get(key, default = rctx.attr.urls.get("", default = []))
if not urls:
print("llvm archive urls missing for host OS key '%s' and no default provided; will try 'distribution' attribute" % (key))
sha256 = rctx.attr.sha256.get(key, "")
strip_prefix = rctx.attr.strip_prefix.get(key, "")

return urls, sha256, strip_prefix, key

def _distribution_urls(rctx):
llvm_version = rctx.attr.llvm_version

if rctx.attr.distribution == "auto":
exec_result = rctx.execute([
_python(rctx),
rctx.path(rctx.attr._llvm_release_name),
llvm_version,
])
if exec_result.return_code:
fail("Failed to detect host OS version: \n%s\n%s" % (exec_result.stdout, exec_result.stderr))
if exec_result.stderr:
print(exec_result.stderr)
basename = exec_result.stdout.strip()
basename = _llvm_release_name(rctx, llvm_version)
else:
basename = rctx.attr.distribution

Expand All @@ -252,9 +276,31 @@ def download_llvm_preconfigured(rctx):
urls.append(pattern.format(llvm_version = llvm_version, basename = basename))
urls.append("{0}{1}".format(_llvm_distributions_base_url[llvm_version], url_suffix))

rctx.download_and_extract(
urls,
sha256 = _llvm_distributions[basename],
stripPrefix = basename[:(len(basename) - len(".tar.xz"))],
auth = _get_auth(rctx, urls),
)
sha256 = _llvm_distributions[basename]

strip_prefix = basename[:(len(basename) - len(".tar.xz"))]

return urls, sha256, strip_prefix

def _host_os_key(rctx):
exec_result = rctx.execute([
_python(rctx),
rctx.path(rctx.attr._os_version_arch),
])
if exec_result.return_code:
fail("Failed to detect host OS name and version: \n%s\n%s" % (exec_result.stdout, exec_result.stderr))
if exec_result.stderr:
print(exec_result.stderr)
return exec_result.stdout.strip()

def _llvm_release_name(rctx, llvm_version):
exec_result = rctx.execute([
_python(rctx),
rctx.path(rctx.attr._llvm_release_name),
llvm_version,
])
if exec_result.return_code:
fail("Failed to detect host OS LLVM archive: \n%s\n%s" % (exec_result.stdout, exec_result.stderr))
if exec_result.stderr:
print(exec_result.stderr)
return exec_result.stdout.strip()
6 changes: 4 additions & 2 deletions toolchain/internal/repo.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ load(
)
load(
"//toolchain/internal:llvm_distributions.bzl",
_download_llvm_preconfigured = "download_llvm_preconfigured",
_download_llvm = "download_llvm",
)

def llvm_repo_impl(rctx):
Expand All @@ -33,9 +33,11 @@ def llvm_repo_impl(rctx):
executable = False,
)

_download_llvm_preconfigured(rctx)
updated_attrs = _download_llvm(rctx)

# We try to avoid patches to the downloaded repo so that it is easier for
# users to bring their own LLVM distribution through `http_archive`. If we
# do want to make changes, then we should do it through a patch file, and
# document it for users of toolchain_roots attribute.

return updated_attrs
23 changes: 23 additions & 0 deletions toolchain/rules.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,24 @@ _common_attrs = {

_llvm_repo_attrs = dict(_common_attrs)
_llvm_repo_attrs.update({
"urls": attr.string_list_dict(
mandatory = False,
doc = ("URLs to LLVM pre-built binary distribution archives, keyed by host OS " +
"release name and architecture, e.g. darwin-x86_64, darwin-aarch64, " +
"ubuntu-20.04-x86_64, etc. May also need the `strip_prefix` attribute. " +
"Consider also setting the `sha256` attribute. An empty key is " +
"used to specify a fallback default for all hosts. This attribute " +
"overrides `distribution`, `llvm_version`, `llvm_mirror` and " +
"`alternative_llvm_sources` attributes if the host OS key is present."),
),
"sha256": attr.string_dict(
mandatory = False,
doc = "The expected SHA-256 of the file downloaded as per the `urls` attribute.",
),
"strip_prefix": attr.string_dict(
mandatory = False,
doc = "The prefix to strip from the extracted file from the `urls` attribute.",
),
"distribution": attr.string(
default = "auto",
doc = ("LLVM pre-built binary distribution filename, must be one " +
Expand Down Expand Up @@ -77,6 +95,11 @@ _llvm_repo_attrs.update({
allow_single_file = True,
doc = "Python module to output LLVM release name for the current OS.",
),
"_os_version_arch": attr.label(
default = "//toolchain/tools:host_os_key.py",
allow_single_file = True,
doc = "Python module to output OS name and ",
),
})

_llvm_config_attrs = dict(_common_attrs)
Expand Down
5 changes: 4 additions & 1 deletion toolchain/tools/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.

exports_files(["llvm_release_name.py"])
exports_files([
"llvm_release_name.py",
"host_os_key.py",
])
Loading

0 comments on commit c728048

Please sign in to comment.