Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition creating symlinks on Windows results in a junction to empty directory being created instead of a symlink to a file #19018

Closed
carpenterjc opened this issue Jul 21, 2023 · 6 comments
Labels
area-Windows Windows-specific issues and feature requests P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: bug

Comments

@carpenterjc
Copy link

Description of the bug:

While using a cc_import to add a dependency to a cc_binary that is produced by a custom starlark rule elsewhere in the repository.

This executable is then use by and run by a cc_test.

What we find is rarely bazel will create "symlink" the file before it is written by the custom rule. When this happens bazel will create a directory junction instead of a symlink to the file.

Sometimes what is supposed to be a symlink (or copy, if symlinks are disabled) shows up as a directory junction instead (this breaks -- you cannot interact with a directory junction which points at a file)
When you're in this state, bazel build doesn't fix the problem. Only a bazel clean or manually removing the junction fixes it.
It appears to be a race condition, because it doesn't happen consistently.
Code inspection of the bazel symlink implementation for Windows shows that it creates a directory junction either if the target exists and is a directory, or if the target doesn't exist -- even if symlinks are disabled on Windows.
This implies the bug: if the core.symlink action is scheduled ahead of the action which generates the link target, it will always produce the wrong result on Windows.

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

This happens rarely and we believe its a race condition.

Which operating system are you running Bazel on?

Windows 11

What is the output of bazel info release?

release 6.2.0

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/windows/WindowsFileSystem.java#L80-L106

that says:
If the target doesn't exist, or if it is a directory, create a junction. We don't check the symlink flag for this case.
Else (the target exists and is a file)
If the symlink flag is enabled, create a symlink
Otherwise copy it (non-atomically, by the way)

Any other information, logs, or outputs that you want to share?

No response

@iancha1992 iancha1992 added the area-Windows Windows-specific issues and feature requests label Jul 21, 2023
@buildbreaker2021 buildbreaker2021 added P3 We're not considering working on this, but happy to review a PR. (No assignee) and removed untriaged labels Aug 16, 2023
@carpenterjc
Copy link
Author

Possibly related: #2474

@carpenterjc
Copy link
Author

Another related issue: #12018
We have found this also happens when creating _virtual_headers for generated files. We have seen that sometimes gflags.h https://github.com/gflags/gflags/blob/master/bazel/gflags.bzl#L32 doesn't exist before the symlink is created and you end up with gflags.h being a junction.

@tjgq
Copy link
Contributor

tjgq commented Sep 21, 2023

@carpenterjc Do you happen to have a repro you can share (even if it's not 100% deterministic and needs to be run a few times to hit the bug)? Please also include the flags you're invoking Bazel with.

@carpenterjc
Copy link
Author

@tjgq I have a reproduction for causing windows to produce junctions instead of header file symlinks.

diff --git a/./.bazelignore b/./.bazelignore
new file mode 100644
index 00000000000..38b453a9094
--- /dev/null
+++ b/./.bazelignore
@@ -0,0 +1 @@
+bazelcache*
\ No newline at end of file
diff --git a/./.bazelrc b/./.bazelrc
new file mode 100644
index 00000000000..644c02ec081
--- /dev/null
+++ b/./.bazelrc
@@ -0,0 +1,2 @@
+build --disk_cache=bazelcache
+build --remote_download_minimal
\ No newline at end of file
diff --git a/./.bazelversion b/./.bazelversion
new file mode 100644
index 00000000000..f9da12e1184
--- /dev/null
+++ b/./.bazelversion
@@ -0,0 +1 @@
+6.3.2
\ No newline at end of file
diff --git a/./.gitignore b/./.gitignore
new file mode 100644
index 00000000000..5c0fc640476
--- /dev/null
+++ b/./.gitignore
@@ -0,0 +1 @@
+bazel*
\ No newline at end of file
diff --git a/./BUILD b/./BUILD
new file mode 100644
index 00000000000..ee6185db48e
--- /dev/null
+++ b/./BUILD
@@ -0,0 +1,9 @@
+
+
+cc_binary(
+    name = "issue",
+    srcs = ["issue.cpp"],
+    deps = [
+        "@com_github_gflags_gflags//:gflags",
+    ],
+)
diff --git a/./WORKSPACE b/./WORKSPACE
new file mode 100644
index 00000000000..6c359e68332
--- /dev/null
+++ b/./WORKSPACE
@@ -0,0 +1,7 @@
+load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
+http_archive(
+    name = "com_github_gflags_gflags",
+    url = "https://github.com/gflags/gflags/archive/refs/tags/v2.2.2.tar.gz",
+    strip_prefix = "gflags-2.2.2",
+    sha256 = "34af2f15cf7367513b352bdcd2493ab14ce43692d2dcd9dfc499492966c64dcf",
+)
\ No newline at end of file
diff --git a/./issue.cpp b/./issue.cpp
new file mode 100644
index 00000000000..9987febf3d7
--- /dev/null
+++ b/./issue.cpp
@@ -0,0 +1,9 @@
+// This file is generated then symlinked as a virtual header
+// #include "change.h"
+
+#include <gflags/gflags.h>
+
+int main()
+{
+    return 0;
+}
\ No newline at end of file
diff --git a/./provoke_bad_symlinking.txt b/./provoke_bad_symlinking.txt
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/./reproduce.bat b/./reproduce.bat
new file mode 100644
index 00000000000..abcbfd59810
--- /dev/null
+++ b/./reproduce.bat
@@ -0,0 +1,14 @@
+REM Build first to populate the cache
+bazel clean
+rmdir bazelcache /q /s
+
+bazel run issue %*
+rem list any junctions
+dir /b /al /s bazel-bin\external\com_github_gflags_gflags
+REM On large CI systems when the build is very busy we see this cause issues
+
+bazel clean
+bazel build ... %*
+bazel run issue %*
+rem list any junctions which should be files in this example - if you see files here it has reproduced
+dir /b /al /s bazel-bin\external\com_github_gflags_gflags

@carpenterjc
Copy link
Author

If you add in some extra dependencies and a manual target you can reporduce the compiler trying to open one of these junctions.

diff --git a/./.bazelignore b/./.bazelignore
new file mode 100644
index 00000000000..38b453a9094
--- /dev/null
+++ b/./.bazelignore
@@ -0,0 +1 @@
+bazelcache*
\ No newline at end of file
diff --git a/./.bazelrc b/./.bazelrc
new file mode 100644
index 00000000000..644c02ec081
--- /dev/null
+++ b/./.bazelrc
@@ -0,0 +1,2 @@
+build --disk_cache=bazelcache
+build --remote_download_minimal
\ No newline at end of file
diff --git a/./.bazelversion b/./.bazelversion
new file mode 100644
index 00000000000..f9da12e1184
--- /dev/null
+++ b/./.bazelversion
@@ -0,0 +1 @@
+6.3.2
\ No newline at end of file
diff --git a/./.gitignore b/./.gitignore
new file mode 100644
index 00000000000..5c0fc640476
--- /dev/null
+++ b/./.gitignore
@@ -0,0 +1 @@
+bazel*
\ No newline at end of file
diff --git a/./BUILD b/./BUILD
new file mode 100644
index 00000000000..ea4354d6ada
--- /dev/null
+++ b/./BUILD
@@ -0,0 +1,24 @@
+
+
+cc_binary(
+    name = "issue",
+    srcs = ["issue_b.cpp"],
+    deps = [
+        ":issue1",
+    ],
+)
+
+cc_library(
+    name = "issue1",
+    srcs = ["issue.cpp"],
+    deps = [
+        "@com_github_gflags_gflags//:gflags",
+    ],
+)
+
+
+cc_library(
+    name = "issue2",
+    srcs = ["issue2.cpp"],
+    tags = ["manual"],
+)
\ No newline at end of file
diff --git a/./WORKSPACE b/./WORKSPACE
new file mode 100644
index 00000000000..6c359e68332
--- /dev/null
+++ b/./WORKSPACE
@@ -0,0 +1,7 @@
+load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
+http_archive(
+    name = "com_github_gflags_gflags",
+    url = "https://github.com/gflags/gflags/archive/refs/tags/v2.2.2.tar.gz",
+    strip_prefix = "gflags-2.2.2",
+    sha256 = "34af2f15cf7367513b352bdcd2493ab14ce43692d2dcd9dfc499492966c64dcf",
+)
\ No newline at end of file
diff --git a/./issue.cpp b/./issue.cpp
new file mode 100644
index 00000000000..9987febf3d7
--- /dev/null
+++ b/./issue.cpp
@@ -0,0 +1,9 @@
+// This file is generated then symlinked as a virtual header
+// #include "change.h"
+
+#include <gflags/gflags.h>
+
+int main()
+{
+    return 0;
+}
\ No newline at end of file
diff --git a/./issue2.cpp b/./issue2.cpp
new file mode 100644
index 00000000000..770f0dcff42
--- /dev/null
+++ b/./issue2.cpp
@@ -0,0 +1 @@
+#include <gflags/gflags.h>
\ No newline at end of file
diff --git a/./issue_b.cpp b/./issue_b.cpp
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/./provoke_bad_symlinking.txt b/./provoke_bad_symlinking.txt
new file mode 100644
index 00000000000..e69de29bb2d
diff --git a/./reproduce.bat b/./reproduce.bat
new file mode 100644
index 00000000000..96d579f5465
--- /dev/null
+++ b/./reproduce.bat
@@ -0,0 +1,18 @@
+REM Build first to populate the cache
+bazel clean
+rmdir bazelcache /q /s
+
+bazel run issue %*
+rem list any junctions
+dir /b /al /s bazel-bin\external\com_github_gflags_gflags
+REM On large CI systems when the build is very busy we see this cause issues
+
+bazel clean
+bazel build ... %*
+bazel run issue %*
+rem list any junctions which should be files in this example - if you see files here it has reproduced
+dir /b /al /s bazel-bin\external\com_github_gflags_gflags
+
+bazel build //:issue2 %*
+rem list any junctions which should be files in this example - if you see files here it has reproduced
+dir /b /al /s bazel-bin\external\com_github_gflags_gflags
\ No newline at end of file

@tjgq
Copy link
Contributor

tjgq commented Jul 19, 2024

I'm going to close this as a duplicate of #21747 since the repro strongly suggests it's the same issue (i.e., you need a disk or remote cache and build without the bytes to reproduce it; if you can repro outside of these conditions, please reopen).

@tjgq tjgq closed this as not planned Won't fix, can't repro, duplicate, stale Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-Windows Windows-specific issues and feature requests P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: bug
Projects
None yet
Development

No branches or pull requests

6 participants
@carpenterjc @tjgq @buildbreaker2021 @Pavank1992 @iancha1992 and others