Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protobuf repo recompilation sensitivity #7095

Open
pcj opened this issue Jan 11, 2019 · 43 comments
Open

Protobuf repo recompilation sensitivity #7095

pcj opened this issue Jan 11, 2019 · 43 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-Server Issues for serverside rules included with Bazel type: bug

Comments

@pcj
Copy link
Member

pcj commented Jan 11, 2019

As discussed on the mailing list https://groups.google.com/forum/#!topic/bazel-discuss/UKLK7Y-jsW4 the protobuf repo is notoriously sensitive to recompilation. Presumably something in the environment used in the action key for protobuf C++ compilation leads to cache invalidation and hence recompiling protobuf from source.

Given protobuf libs are often base dependencies for many others, leads to degraded user experience.

Repro steps include taking any workspace that has a proto dependency and source ~/.bashrc, watch protobuf recompile.

Observed up to and including bazel 0.20.0.

/assign @hlopko

@hlopko
Copy link
Member

hlopko commented Jan 11, 2019

Can you double check that:

git clone https://github.com/bazelbuild/rules_cc
cd rules_cc
bazel test //tools/migration/...
source ~/.bashrc
bazel test //tools/migration/...

results in recompilation? What is the minimal content of the ~/.bashrc file that triggers the issue? What platform are you on?

@hlopko hlopko self-assigned this Jan 11, 2019
@hlopko hlopko added under investigation P2 We'll consider working on this in future. (Assignee optional) team-Rules-CPP Issues for C++ rules labels Jan 11, 2019
@pcj
Copy link
Member Author

pcj commented Jan 11, 2019

Looks like it's the PATH. Any changes to PATH will trigger recompilation.

Perhaps this is working as intended, or perhaps we can split it and use a set to dedup entries to normalize the path to the minimal necessary set.

Is it possible to exclude PATH here and rely solely on the toolchain?

@pcj
Copy link
Member Author

pcj commented Jan 11, 2019

$ uname -a
Linux bina 4.13.0-46-generic #51-Ubuntu SMP Tue Jun 12 12:36:29 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

@hlopko
Copy link
Member

hlopko commented Jan 11, 2019

Hmmm then #6648 should actually help you (afaik). Can you try with bazel 0.21?

@hlopko
Copy link
Member

hlopko commented Jan 11, 2019

or with 0.20 and with --incompatible_strict_action_env

@pcj
Copy link
Member Author

pcj commented Jan 11, 2019

Yes! Tested on bazel 0.21.0 and indeed no longer recompiles with PATH change. I guess I will go ahead and close this. The migration path from 0.19.2 to 0.20.0 and 0.21.0 is a little challenging (for this same reason), but clearly beneficial. Thanks for your help @hlopko.

@pcj pcj closed this as completed Jan 11, 2019
@hlopko
Copy link
Member

hlopko commented Jan 11, 2019

Awesome, thanks Paul. Can you pls share this solution on the mailing list?

@buchgr buchgr reopened this Mar 12, 2019
@hlopko
Copy link
Member

hlopko commented Mar 12, 2019

@pcj What toolchain do you use? Was it the Bazel's autoconfigured C++ toolchain?

@hlopko
Copy link
Member

hlopko commented Mar 12, 2019

Also, on what platforms do you observe this?

@pcj
Copy link
Member Author

pcj commented Mar 12, 2019

@hlopko I had observed it on linux & darwin before bazel 0.21.0. Now that I'm on 0.21 I'm really not seeing this anymore.

For most projects I'm not using a special toolchain, so yes, the autoconfigured one.

@buchgr
Copy link
Contributor

buchgr commented Mar 12, 2019

I believe that I found the root cause. It looks like this issue is not specific to C++, but that almost all actions (spawn actions, c++ compile, c++ link, java compile) in Bazel declare a dependency on PATH independent of whether these actions actually use them. It's probably just most obvious with c++ actions as they typically take quite long.

@keith
Copy link
Member

keith commented Mar 12, 2019

That would make sense with our solution to this which was to launch bazel with env -i and only pass very limited variables (and a static PATH variable)

@meisterT
Copy link
Member

@buchgr friendly ping

@buchgr
Copy link
Contributor

buchgr commented May 16, 2019

@meisterT what's up? :)

@meisterT
Copy link
Member

I have heard from bazel users that this is still a pain point. So I wanted to check in if you are working on a fix and when we can expect it to land?

@buchgr
Copy link
Contributor

buchgr commented May 16, 2019

Yes I am working on a proposal to fix it. I don't want to make promises on when it will land. 1-2 quarters.

@sudoforge
Copy link

I currently suffer from the protobuf targets rebuilding (a dependency of bazelbuild/buildtools (specifically, buildifier), but cannot reproduce the issue by re-sourcing my shell's rc file. To be honest, I'm really not sure how to debug this further other than "it happens every 5-10 minutes of inactivity".

@meisterT
Copy link
Member

meisterT commented May 2, 2022

@stevekuznetsov Can you write up repro instructions for cockroachdb w. gazelle please?

@stevekuznetsov
Copy link

@meisterT pinning down a direct reproducer is tricky. Cross-linked to an issue on the Cockroach project where devs talk about other targets getting recompiled, with some theories about why.

@aaliddell
Copy link
Contributor

I've tried to trace this cache fragility before and thought it might be protocolbuffers/protobuf#6886. However, IIRC that didn't solve it. When I've triggered this cache issue before with execution logging turned on, the action hashes between builds were identical but it was still rebuilding, suggesting it is not exclusively an env related issue.

One consideration may be that protobuf is treated somewhat special by Bazel, in that there are native flags and rules that reference it. As such, this may be why protobuf behaves unlike any other external repo with regard to caching. Since the execution logs suggest there should be no rebuild, is there a way of tracing deeper within Bazel as to why it made a decision to rebuild?

@pcj
Copy link
Member Author

pcj commented May 5, 2022

Presumably this would all go away if the proto rules were starlarkified as has been discussed for several years. Any update from the bazel team on this?

@meisterT
Copy link
Member

meisterT commented May 5, 2022

cc @comius

@arran-fathom
Copy link

Any chance this can be bumped to a P1 please? It affects multiple people and has a direct impact on workflow because all builds and tests for those people are significantly slowed.

In my case I think it may also be the root cause of Go IntelliSense bindings slowing down because they rely on @io_bazel_rules_go//go/tools/gopackagesdriver to source proto-generated files. This grinds my workflow to a halt.

@dt
Copy link

dt commented Nov 17, 2022

I spent a little time this morning looking into why I was seeing very frequent protoc recompilations in cockroachdb/cockroach development and I believe that in my case might also be down to PATH changes as described above, but what was surprising to me is that this is a result of using the rules_go recommended editor configuration for my editor (VS Code).

My editor is configured using the recommended configuration, which includes both a) pointing gopls at bazel and gopackagesdriver and b) setting goroot to ${workspaceFolder}/bazel-${workspaceFolderBasename}/external/go_sdk.

Configuring the goroot seems more correct than letting vscode go invoke system go, so that when it runs go mod tidy or whatever, it is using the exact same version and installation of the sdk as bazel is when building. It appears to do so by prepending the configured path to PATH when invoking all tools. In isolation, this seems good/correct.

However this means that when gopls is invoking gopackagesdriver and thus bazel, it is doing so with this modified PATH variable. Thus the bazel invocations in my workspace -- both my manual, direct runs and the background, gopls executions -- are constantly switching back and forth, between my usual PATH and the modified PATH used by VSCode Go to invoke tools due to the overridden Go SDK.

Simulating this by hand, by running PATH=_bazel/path/to/sdk:$PATH bazel build some/tiny/pkg followed by a plain bazel build some/tiny/pkg and then back again causes the same observed effect, of invalidating protoc.

I guess there's not really new information here, insofar as protoc's sensitivity to PATH changes is already covered above, but what wasn't obvious to me at least when I started poking at this was that my PATH was changing so often, unbeknownst to me, just as a result of using the recommended editor configuration.

@sudoforge
Copy link

sudoforge commented Nov 17, 2022

it wasn't obvious to me going in here that my PATH was changing so often as a result of using the recommended configuration.

yeah, this is really a bit of hidden knowledge that you learn through trial like you, but due to the sensitivity of PATH changes here, it's best to always execute bazel in the same environment. i've come to the conclusion that this means either focusing on

A) only executing bazel in the same environment locally, e.g. always via VS Code, or always via your shell; or
B) setting up RBE and always relying on that

@dt
Copy link

dt commented Nov 17, 2022

only executing bazel in the same environment locally, e.g. always via VS Code, or always via your shell

For what it's worth, the executions "via VS Code" here aren't voluntary ie me electing to trigger a build in my editor versus from my terminal -- they are the result of using the recommended editor configuration which changes its code navigation/inspection (via gopls) to be backed by bazel and gopackagedriver (which seems good!). These invocations happen "out of sight" and in the background so it wasn't obvious to me, at first, that they were what was thrashing PATH.

@pdeva
Copy link

pdeva commented Dec 24, 2022

i dont thonk the vscode config is the rooti ssue. we use vscode too, but point it to manually installed Go. so no changes to PATH or anything. yet merely afterbrunning gazelle to update a minor unrelated dependency rebuilds the entire protobuf stuff and thus the entire rest of the repo for some reason.

@fmeum
Copy link
Collaborator

fmeum commented Dec 24, 2022

@pdeva If you can minimize your repo to a reproducer you can share, I can investigate the rebuilds.

@dt
Copy link

dt commented Dec 28, 2022

Just one last update from my digging, just in case it can save anyone else some time if they're in a similar situation:

I dug a bit more and realized that in our repo (cockroachdb/cockorach), we had developed particular sensitivity to PATH changes earlier this year when a developer added action_env=PATH to our dev config. It looks like this was motivated by new arm-based Macs having a different default location for homebrow-installed tools, including those needed by foreign_cc, now under /opt instead of /usr/local, only the latter of which is in the hardcoded default used by foreign_cc. Using action_env=PATH to let the user's existing configuration of platform-specific paths probably seemed correct/preferred at the time and in isolation, given we didn't realize tools would be invoking bazel with customized PATHs.

I have since changed our common dev configuration back to using a hardcoded PATH, based the default from foriegn_cc just with the addition of the new /opt homebrew location, and have not since heard additional reports of "spurious" protoc rebuilds 🤞.

Again, this one is probably pretty obvious in hindsight -- having a tool config that constantly changes PATH and a build configuration that makes compilation depend on PATH implies constant recompilation -- but in isolation each of the steps involved to arrive at that situation seemed innocuous enough at the time. I just wanted to spell it out on the off chance it saves anyone else a little debugging time.

@nioan-aux
Copy link

nioan-aux commented Oct 4, 2024

I'm wondering if there are any updates on this. It's affecting our build as well.

Would it be possible to implement a work around to expose a flag that allows for a pre-build version of the grpc compiler to be used instead of rebuilding it? Pantsbuild is using this approach.

This has been open for 5+ years, and it seems quite tricky to reproduce and find the root cause for. A workaround might be a good interim solution

mbland added a commit to mbland/rules_scala that referenced this issue Oct 4, 2024
I solved the rest of the build failures from bazelbuild#1618. Now multiple jobs
are failing due to hanging scalac worker jobs as described in the PR
description.

- Proto Toolchainisation Design Doc
  https://docs.google.com/document/d/1CE6wJHNfKbUPBr7-mmk_0Yo3a4TaqcTPE0OWNuQkhPs/edit

- bazelbuild/bazel: Protobuf repo recompilation sensitivity
  bazelbuild/bazel#7095

- bazelbuild/rules_proto: Implement proto toolchainisation
  bazelbuild/rules_proto#179

- rules_proto 6.0.0 release notes mentioning Protobuf Toolchainisation
  https://github.com/bazelbuild/rules_proto/releases/tag/6.0.0

It occurred to me to try adopting Proto Toolchainisation to see if that
might resolve the issue. I've got it successfully generating the
`@io_bazel_rules_scala_protoc` repo and registering toolchains for
`@rules_proto//proto:toolchain_type`. However, it's currently dying
because there's no toolchain registered for
'@rules_java//java/proto:toolchain_type'.

```txt
bazel build //src/...
ERROR: .../src/protobuf/io/bazel/rules_scala/BUILD:4:14:
  in @@_builtins//:common/java/proto/java_proto_library.bzl%bazel_java_proto_aspect
  aspect on proto_library rule
  //src/protobuf/io/bazel/rules_scala:diagnostics_proto:

Traceback (most recent call last):
  File "/virtual_builtins_bzl/common/java/proto/java_proto_library.bzl",
    line 53, column 53, in _bazel_java_proto_aspect_impl
  File "/virtual_builtins_bzl/common/proto/proto_common.bzl",
    line 364, column 17, in _find_toolchain
Error in fail: No toolchains registered for '@rules_java//java/proto:toolchain_type'.
ERROR: Analysis of target '//src/protobuf/io/bazel/rules_scala:diagnostics_proto' failed
ERROR: Analysis of target '//src/java/io/bazel/rulesscala/scalac:scalac' failed; build aborted: Analysis failed
```
mbland added a commit to mbland/rules_scala that referenced this issue Oct 4, 2024
I solved the rest of the build failures from bazelbuild#1618. Now multiple jobs
are failing due to hanging scalac worker jobs as described in the PR
description.

- Proto Toolchainisation Design Doc
  https://docs.google.com/document/d/1CE6wJHNfKbUPBr7-mmk_0Yo3a4TaqcTPE0OWNuQkhPs/edit

- bazelbuild/bazel: Protobuf repo recompilation sensitivity
  bazelbuild/bazel#7095

- bazelbuild/rules_proto: Implement proto toolchainisation
  bazelbuild/rules_proto#179

- rules_proto 6.0.0 release notes mentioning Protobuf Toolchainisation
  https://github.com/bazelbuild/rules_proto/releases/tag/6.0.0

It occurred to me to try adopting Proto Toolchainisation to see if that
might resolve the issue. I've got it successfully generating the
`@io_bazel_rules_scala_protoc` repo and registering toolchains for
`@rules_proto//proto:toolchain_type`. However, it's currently dying
because there's no toolchain registered for
'@rules_java//java/proto:toolchain_type'.

```txt
bazel build //src/...
ERROR: .../src/protobuf/io/bazel/rules_scala/BUILD:4:14:
  in @@_builtins//:common/java/proto/java_proto_library.bzl%bazel_java_proto_aspect
  aspect on proto_library rule
  //src/protobuf/io/bazel/rules_scala:diagnostics_proto:

Traceback (most recent call last):
  File "/virtual_builtins_bzl/common/java/proto/java_proto_library.bzl",
    line 53, column 53, in _bazel_java_proto_aspect_impl
  File "/virtual_builtins_bzl/common/proto/proto_common.bzl",
    line 364, column 17, in _find_toolchain
Error in fail: No toolchains registered for '@rules_java//java/proto:toolchain_type'.
ERROR: Analysis of target '//src/protobuf/io/bazel/rules_scala:diagnostics_proto' failed
ERROR: Analysis of target '//src/java/io/bazel/rulesscala/scalac:scalac' failed; build aborted: Analysis failed
```
@fzakaria
Copy link
Contributor

fzakaria commented Oct 23, 2024

For those curious, here are two compact execution logs.
All I did was change PATH before running Bazel.
(I added a fake /bin2/)

> bazel-bin/src/tools/execlog/parser \
  --log_path=exec_protoc1.log \
  --log_path=exec_protoc2.log \
  --output_path=/tmp/exec1.log.txt \
  --output_path=/tmp/exec2.log.txt

I looked at the output.

> diff /tmp/exec1.log.txt /tmp/exec2.log.txt | head
67c67
<   value: "/Users/fzakaria/Library/Caches/bazelisk/downloads/sha256/7dc88df4842a5b49830c55b174551445587ae60f027a67d0f847dbb35f1f40c6/bin:/Users/fzakaria/.pyenv/shims:/Users/fzakaria/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/homebrew/bin:/Applications/iTerm.app/Contents/Resources/utilities:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/access-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/cloud-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/connect-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/k8saas-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/devprod-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/traffic-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/ksql-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/orchestrator-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/kafka-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/servicefoundations-beta:/Users/fzakaria/.zsh/plugins/powerlevel10k"
---
>   value: "/Users/fzakaria/Library/Caches/bazelisk/downloads/sha256/7dc88df4842a5b49830c55b174551445587ae60f027a67d0f847dbb35f1f40c6/bin:/Users/fzakaria/.pyenv/shims:/Users/fzakaria/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/opt/homebrew/bin:/Applications/iTerm.app/Contents/Resources/utilities:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/access-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/cloud-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/connect-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/k8saas-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/devprod-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/traffic-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/ksql-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/orchestrator-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/kafka-ga:/Users/fzakaria/code/github.com/confluentinc/cc-dotfiles/bin/servicefoundations-beta:/Users/fzakaria/.zsh/plugins/powerlevel10k:/bin2/"
568,569c568
< runner: "disk cache hit"
< cache_hit: true
---
> runner: "darwin-sandbox"
573c572

The environment variable is part of the action.
Is there some special passing of PATH for CC actions?

Is there a reason why PATH would be part of the action itself and not the digest of the tools itself for instance.

exec_protoc1.log
exec_protoc2.log

@fmeum
Copy link
Collaborator

fmeum commented Oct 24, 2024

The compilation action for C++ uses the environment determined by --action_env, which also includes PATH by default.

For the auto-detected C++ toolchain, it's pretty difficult to know which PATH entries are needed to run any of the toolchain binaries on the host. If that could be figured out, the relevant PATH entries could be set by the toolchain instead and then the C++ actions wouldn't need to inherit the full PATH.

@fzakaria
Copy link
Contributor

So we can probably close this then?

@fmeum
Copy link
Collaborator

fmeum commented Oct 24, 2024

I would say yes as there is nothing about this that's specific to protobuf. It would be great to continue the discussion about why PATH is set on C++ compilation actions in a more narrowly scoped, new issue.

@DeCarabas
Copy link

I will say that when I was investigating and debugging this I had a hermetic toolchain. Every compiler binary I needed was fetched by bazel and I provided the exact path to each tool as required.

I understand needing PATH when you're using a system compiler but...

(I agree that this probably isn't directly related to protobuf but please carry this forward into future discussions...)

@fzakaria
Copy link
Contributor

@fmeum @DeCarabas -- A correct approach I've been thinking about is for the ActionKey to use the digest of the tools it uses and not the environment variables;

So if you find clang from PATH, I don't necessarily care about how it was found, but instead that it's the exact same compiler. (Might need to extend this to the whole transitive graph of the binary with ldd)

This feels more systemic to how Bazel tracks system dependencies that can make reliance of the system a lot simpler.
That way if two machines are deployed with the same clang binary installation, they can get same caching behavior?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-Server Issues for serverside rules included with Bazel type: bug
Projects
None yet
Development

No branches or pull requests