-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race conditions with local action deduplication #23288
Comments
@bazel-io fork 7.3.1 |
…lbuild#23069)" This reverts commit ffe1df5. Causes race conditions, see bazelbuild#23288 for details.
The second issue is fixed by #23296. The first one is more difficult to solve: the Java compilation action runs two spawns and modifies or deletes outputs in multiple cases, both before falling back to the second spawn and after each spawn to rewrite the @tjgq Do you happen to know how remote cache uploads interact with this logic? It seems like either they are robust against this kind of race and then local execution deduplication could reuse the same logic or they are potentially affected by this as well. |
@bazel-io fork 7.3.1 |
@iancha1992 Please consider this a (very) soft blocker for 7.3.1 only: it would be great to get in if ready in time, but there is no need to block the release for it. |
I would not expect remote uploads to be robust; the upload logic assumes that, once an output file is created, it will not disappear for the remainder of the build. This is also a problem for async uploading. At some point I'd like to find a general solution, but I think for the time being we should special-case our way out of it (as you're already doing in #23307). |
This fixes failures such as the following when a spawn, e.g. a reduced Java compilation spawn, doesn't create an optional output: ``` java.io.FileNotFoundException: /private/var/tmp/_bazel_fmeum/507738cfc7e6cde00e4a0230e9aa0722/execroot/_main/bazel-out/_tmp/actions/remote/175.tmp (No such file or directory) at com.google.devtools.build.lib.unix.NativePosixFiles.lstat(Native Method) at com.google.devtools.build.lib.unix.UnixFileSystem.statInternal(UnixFileSystem.java:212) at com.google.devtools.build.lib.unix.UnixFileSystem.stat(UnixFileSystem.java:201) at com.google.devtools.build.lib.vfs.Path.stat(Path.java:290) at com.google.devtools.build.lib.vfs.FileSystemUtils.moveFile(FileSystemUtils.java:456) at com.google.devtools.build.lib.remote.RemoteExecutionService.moveOutputsToFinalLocation(RemoteExecutionService.java:878) ... ``` Also clean up temporary files in case of an exception. Work towards #23288 Closes #23296. PiperOrigin-RevId: 665744936 Change-Id: I89a409c7a6b28b2a5fa532bdb233dca9bc5bde73
This fixes failures such as the following when a spawn, e.g. a reduced Java compilation spawn, doesn't create an optional output: ``` java.io.FileNotFoundException: /private/var/tmp/_bazel_fmeum/507738cfc7e6cde00e4a0230e9aa0722/execroot/_main/bazel-out/_tmp/actions/remote/175.tmp (No such file or directory) at com.google.devtools.build.lib.unix.NativePosixFiles.lstat(Native Method) at com.google.devtools.build.lib.unix.UnixFileSystem.statInternal(UnixFileSystem.java:212) at com.google.devtools.build.lib.unix.UnixFileSystem.stat(UnixFileSystem.java:201) at com.google.devtools.build.lib.vfs.Path.stat(Path.java:290) at com.google.devtools.build.lib.vfs.FileSystemUtils.moveFile(FileSystemUtils.java:456) at com.google.devtools.build.lib.remote.RemoteExecutionService.moveOutputsToFinalLocation(RemoteExecutionService.java:878) ... ``` Also clean up temporary files in case of an exception. Work towards bazelbuild#23288 Closes bazelbuild#23296. PiperOrigin-RevId: 665744936 Change-Id: I89a409c7a6b28b2a5fa532bdb233dca9bc5bde73
When an action may modify a spawn's outputs after execution, the upload of outputs to the cache and reuse for deduplicated actions need to happen synchronously directly after spawn execution to avoid a race. This commit implements this for cache uploads by marking all actions with this property and simply disabling async upload for all spawns executed by such actions. For output reuse, all executions deduplicated against the first one register atomically upon deduplication and cause the cache upload to wait for all of them to complete reuse. Fixes bazelbuild#22501 Fixes bazelbuild#23288 Work towards bazelbuild#21578 Closes bazelbuild#23307 (no longer needed) Closes bazelbuild#23382. PiperOrigin-RevId: 668101364 Change-Id: Ice75dbe14a7dd46e02ecb096d2b2a30940216356
A fix for this issue has been included in Bazel 7.4.0 RC1. Please test out the release candidate and report any issues as soon as possible. |
While testing path mapping on Bazel itself with 7.3.0, I just encountered the following exceptions:
and
Originally posted by @fmeum in #22658 (comment)
The text was updated successfully, but these errors were encountered: