Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctx.actions.symlink + building without the bytes misbehaves on Windows #21747

Open
hauserx opened this issue Mar 20, 2024 · 3 comments
Open

ctx.actions.symlink + building without the bytes misbehaves on Windows #21747

hauserx opened this issue Mar 20, 2024 · 3 comments
Labels
area-Windows Windows-specific issues and feature requests P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@hauserx
Copy link
Contributor

hauserx commented Mar 20, 2024

Description of the bug:

The issue seems to be that if built normally, the builder_reset\builder.exe created in rules_go below through ctx.actions.symlink (link) is a regular file when run without cache, but is JUNCTION when taken from remote cache.

Looks like junctions for files do not work well under Windows.

Below reproduction on hello world example using rules_go: https://github.com/hauserx/rules-go-startup

After regular build:

> dir C:\b\execroot\_main\bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder_reset
2024-03-20  15:57         5,992,448 builder.exe

After taking from remote cache:

> dir C:\b\execroot\_main\bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder_reset
2024-03-20  15:48    <JUNCTION>     builder.exe [C:\b\execroot\_main\bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder.exe]

>bazel clean
Starting local Bazel server and connecting to it...
INFO: Starting clean.

>bazel build :hello --remote_cache=<remote cache>
Target //:hello up-to-date:
  bazel-bin/hello_/hello.exe
INFO: Build completed successfully, 11 total actions

>bazel clean
INFO: Starting clean.

>bazel build :hello --remote_cache=<remote cache>
INFO: 11 processes: 4 remote cache hit, 7 internal.
INFO: Build completed successfully, 11 total actions

>bazel build :hello
INFO: Analyzed target //:hello (0 packages loaded, 0 targets configured).
ERROR: C:/b/external/io_bazel_rules_go/BUILD.bazel:42:7: GoStdlib external/io_bazel_rules_go/stdlib_/pkg failed: (Exit -1): builder.exe failed: error executing GoStdlib command (from target @@io_bazel_rules_go//:stdlib) bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder_reset\builder.exe stdlib -sdk external/go_sdk -installsuffix windows_amd64 -out ... (remaining 5 arguments skipped)
Action failed to execute: java.io.IOException: ERROR: src/main/native/windows/process.cc(202): CreateProcessW("C:\b\execroot\_main\bazel-out\x64_windows-opt-exec-ST-13d3ddad9198\bin\external\go_sdk\builder_reset\builder.exe" stdlib -sdk external/go_sdk -installsuffix windows_amd64 -out bazel-out/x64_windows-fastbuild/bin/external/io_bazel_rules_go/stdlib_ -package std -gcflags ""): Access is denied.
 (error: 5)
Target //:hello failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 26.594s, Critical Path: 25.80s
INFO: 3 processes: 2 internal, 1 local.
ERROR: Build did NOT complete successfully

Which category does this issue belong to?

Core

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

No response

Which operating system are you running Bazel on?

Windows

What is the output of bazel info release?

release 7.1.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

Below bug describes some non-deterministic behavior with same symptoms.
#19018
In contrary this bug is fully reproducible and seems to be caused by build without bytes behavior.

Any other information, logs, or outputs that you want to share?

No response

@github-actions github-actions bot added the team-Core Skyframe, bazel query, BEP, options parsing, bazelrc label Mar 20, 2024
@hauserx
Copy link
Contributor Author

hauserx commented Mar 20, 2024

There is some code where bazel is creating junctions if cannot find target file:

// Still Create a dangling junction if the target doesn't exist.

@coeuvre coeuvre assigned tjgq and unassigned sgowroji and iancha1992 Mar 20, 2024
@coeuvre coeuvre added team-Remote-Exec Issues and PRs for the Execution (Remote) team and removed team-Core Skyframe, bazel query, BEP, options parsing, bazelrc labels Mar 20, 2024
@coeuvre coeuvre added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Mar 26, 2024
@laszlocsomor
Copy link
Contributor

laszlocsomor commented Jun 28, 2024

Thanks for the detailed bug report!

I suspect the fix is simply:

-      // Still Create a dangling junction if the target doesn't exist.
-      if (!target.toFile().exists() || target.toFile().isDirectory()) {
-        WindowsFileOperations.createJunction(link.toString(), target.toString());
+      if (createSymbolicLinks) {
+        WindowsFileOperations.createSymlink(link.toString(), target.toString());
       } else {
-        if (createSymbolicLinks) {
-          WindowsFileOperations.createSymlink(link.toString(), target.toString());
+        // Still Create a dangling junction if the target doesn't exist.
+        if (!target.toFile().exists() || target.toFile().isDirectory()) {
+          WindowsFileOperations.createJunction(link.toString(), target.toString());
         } else {
           Files.copy(target, link);
         }

...but I had no time yet to try it out and implement a test.

I believe this would fix the createSymbolicLinks=true case. There's no hope for the other case; junctions are inherently different from symlinks (junctions can only point to directories, and they are always absolute), creating a dangling junction instead of a dangling symlink was just enough to fix #2474 (in 6c07525).

luke-flow added a commit to luke-flow/rules_go that referenced this issue Jul 1, 2024
When using remote cache, Windows symlinks
which should be executable may break, see:
bazelbuild/bazel#21747

Instead of creating symlinks, copy the bins instead
until the issue is resolved.
@tjgq tjgq changed the title Bazel fails on windows when switching from remote cache on to off (junctions for files) ctx.actions.symlink + building without the bytes misbehaves on Windows Jul 19, 2024
@tjgq
Copy link
Contributor

tjgq commented Jul 19, 2024

The issue here is that, when building without the bytes, we create the symlink before the file it points to exists (it will be downloaded later if the symlink is ever consumed as an action input, or is itself materialized as an output).

The createSymbolicLink implementation for Windows creates a junction if the target path doesn't exist, which will fail later because junctions can only point at directories, not files. As Laszlo points out above, we could make it work for the --windows_enable_symlinks case, but that's not a complete solution since not everyone is able or willing to set this flag (it requires administrator privileges).

Possible solutions, in rough order of complexity:

  1. Always fall back to making a copy on Windows if the symlink target doesn't exist.
  2. Disable "building without the bytes" for symlink artifacts (i.e., the target of the symlink is always downloaded, even with --remote_download_minimal. This must happen transitively since symlinks can point to other symlinks.
  3. Plant a dangling junction initially, but replace it with a symlink or copy when the target is later downloaded (might be too complex).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Windows Windows-specific issues and feature requests P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
None yet
Development

No branches or pull requests

6 participants