-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remote/correctness: Bazel doesn't track system libraries. #4558
Comments
Is there a workaround for this currently? |
@BenTheElder Yep exactly. It's not just compilers though it's any tools that your build uses from your system that you did not explicitly specify really. So you might want to include some more info than just your compiler. |
Would the use of toolchains help with this? |
Our naive proposal is to allow each toolchain to add a toolchain identifier into the cache key(s). The risk is that a bug (or a design flaw) in the toolchain might cause incorrect caching, but, on the other hand, it leaves the door open to toolchains that want to cache across platforms. E.g., a Java 7 compiler on Mac and Linux outputs essentially the same files, so caching those might be safe even if the binaries are not bit identical. |
I forgot to follow up here: setting |
@BenTheElder It’s my understanding that this would be a bug in Bazel, correct Something that works for sure with every action is |
I don't have a great reproducer currently, in https://github.com/kubernetes/test-infra I modify the bazelrc to point at a local copy of bazel-remote-cache from within a debian+bazel docker container and then I build again from another container after upgrading the GCC version to gcc-7. When doing this I noticed stale headers from gcc-4.9 being included as part of compiling https://github.com/google/protobuf as a dependency despite setting an @buchgr where is |
See #3320. :-/ |
@BenTheElder ohh I apologize for the misinformation - it was my understanding that Essentially, you can pass it the following protobuf:
This is passed through to the computation of the action key for remote caching. |
when I build bazel from source ( bazel-0.10.0), I met the system env var problem:
because in my system, "cp" command is under ~/bin/ but not /bin (and I do not have root privileges) I have tried "--action_env=PATH" but it does not work |
|
The current design does not make it easy to do cross-platform caching of platform-independent outputs (e.g., Java bytecode); we're currently adding in the full command-line and env variables, which will often differ (e.g., Windows command line contains backward slashes vs. Linux uses forward slashes). Right now, if all the input files are platform-independent, and the command-lines / env variables happen to be identical (e.g., this could happen for Linux / MacOS with the right flags), then the current cache will return a cache hit - that's actually a correctness issue right now, since we can't guarantee that the outputs are actually platform independent, i.e., this would even happen for an action involving gcc if the command line, env variables, and inputs happen to be identical. I believe that we actually have a bug report for that somewhere. Our rough plan for solving this is to add a toolchain identifier into the cache key to prevent such cache hits (this isn't a fully baked proposal yet), and I expect that we'll allow users to override those / provide identifiers that happen to be identical across platforms (handwave). Getting cache hits across Windows and Linux / MacOS platforms is a more complex topic that we haven't really started to look into yet. We'll need to figure something out about paths for sure, but there are more problems. In particular, what immediately comes to mind is that line endings make source files not bit-for-bit identical so we can't use a straightforward hash, or we have to require users to use a consistent line ending convention across multiple platforms. Another problem is case sensitivity of file systems (or lack thereof), and the differences in command-line flags (e.g., Windows usually uses /flag, whereas Linux usually uses -flag or --flag). |
Following back up on this after a while, FWIW we've been working around this for kubernetes for a while now by hashing the toolchains ourselves (since we run in debian container(s) we can do this pretty easily) and using this to key our cache.
So far this has worked well enough as a stopgap. |
@buchgr For now I will try `--remote_http_cache=https://${cache_host}/${extra_cache_key}", inspired by @BenTheElder's link. |
It’s not obvious to me that this issue is specific to the remote cache; wouldn’t it affect the local cache as well? If not, why not? |
@HackAttack correct. however, the local cache affects only a single user while the remote cache can do quite a bit of harm. |
Could you elaborate on this? Which actions do not consume this and which do? |
To be honest after more than a year I don't really remember and other details have changed pretty significantly since then. I no longer have much activity related to this. We've been reasonably happy with the distinct cache locations, FWIW. I think recent efforts are focused on using GCP RBE which uses the build container image as a key IIRC. |
In some client setups, untracked local files can be used by an action without being included in the Action message, which causes action cache collisions: bazelbuild/bazel#4558 Ideally this should be fixed on the client side (either in the client, or in the build configuration), but it is not always easy to do in practice. As a workaround, this patch adds a setting to mangle ActionCache keys with the instance name provided by the client (if it is not empty), to produce a new ActionCache key. Clients are then able to specify a different instance name whenever a change is made that could affect these untracked inputs. The instance name value could be something like the hash of the compiler version. This allows multiple ActionCache items to exist in the cache, without requiring a change to the on-disk storage format. This feature is disabled by default, since it would cause cache invalidations for existing users. Fixes buchgr#15.
In some client setups, untracked local files can be used by an action without being included in the Action message, which causes action cache collisions: bazelbuild/bazel#4558 Ideally this should be fixed on the client side (either in the client, or in the build configuration), but it is not always easy to do in practice. As a workaround, this patch adds a setting to mangle ActionCache keys with the instance name provided by the client (if it is not empty), to produce a new ActionCache key. Clients are then able to specify a different instance name whenever a change is made that could affect these untracked inputs. The instance name value could be something like the hash of the compiler version. This allows multiple ActionCache items to exist in the cache, without requiring a change to the on-disk storage format. This feature is disabled by default, since it would cause cache invalidations for existing users. Fixes #15.
I'm recently seeing action_env being consumed by some cc_library actions and not by others. See #12059 |
This allows for manually invalidating prior cache results when there are incompatible changes that Bazel doesn't handle. For example, changing the C++ compiler version. See bazelbuild/bazel#4558
This allows for manually invalidating prior cache results when there are incompatible changes that Bazel doesn't handle. For example, changing the C++ compiler version. See bazelbuild/bazel#4558
Since bazel can consume anything from the system by default, it would seem natural to add some kind of extra input to all build actions that represents the complete state of the system, for example the hash of a docker container or the commit sha of the scripts that provision the system. This is similar to what was suggested above, with using However, this falls down when using remote execution because this is specified by the machine invoking bazel rather than the remote execution service. The machine invoking bazel has no way of knowing which environment the remote execution service will use to execute the action. |
The solution we're using in rules_ll is to wrap the entire build environment in nix and generate remote execution toolchains from that. We then replicate the RBE environment locally and can seamlessly switch between RBE and local builds and reuse the same cache. Upsides:
Downsides:
Implementation:
|
Does this only apply to the toolchain, or including external libs as well?
|
Bazel currently does not track tools outside a workspace. This can be a problem if, for example, an action uses a compiler from
/usr/bin/
. Then, two users with differentcompilers installed will wrongly share cache hits because the outputs are different but they have the same action hash.
The text was updated successfully, but these errors were encountered: