Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make --experimental_remote_cache_async not experimental and on by default #21578

Closed
brentleyjones opened this issue Mar 5, 2024 · 7 comments
Closed
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request

Comments

@brentleyjones
Copy link
Contributor

Description of the feature request:

It would be nice if the --experimental_remote_cache_async flag was no longer experimental and on by default. If that's not possible now, then issues should be identified and fixed as well in order to allow the flip.

Which category does this issue belong to?

Remote Execution

What underlying problem are you trying to solve with this feature?

Better build times when using a remote cache.

Which operating system are you running Bazel on?

No response

What is the output of bazel info release?

No response

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@github-actions github-actions bot added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Mar 5, 2024
@tjgq
Copy link
Contributor

tjgq commented Mar 6, 2024

I'd like this to happen at some point before the Bazel 8 release, but I'm not treating it as a priority at the moment.

@tjgq tjgq added P2 We'll consider working on this in future. (Assignee optional) untriaged and removed untriaged P2 We'll consider working on this in future. (Assignee optional) labels Mar 6, 2024
@tjgq
Copy link
Contributor

tjgq commented Mar 6, 2024

(Moving it back to untriaged so we explicitly discuss it at the next team meeting.)

@meisterT
Copy link
Member

related: #19273

@coeuvre
Copy link
Member

coeuvre commented Mar 12, 2024

I am aware of some open issues for this flag, we need to address them before make it default. Like @tjgq said, it's not a priority at the moment, but we should make this happen before Bazel 8.

@tjgq tjgq added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels Mar 12, 2024
@tjgq
Copy link
Contributor

tjgq commented Jun 17, 2024

A known issue is that async uploading requires spawn outputs to not be moved/deleted until they've been uploaded. One situation where this isn't upheld is #22501. Test attempts are another (each attempt produces the test outputs at their expected paths, but then Bazel moves them into a per-attempt path before running the next attempt).

@tjgq
Copy link
Contributor

tjgq commented Aug 20, 2024

As @fmeum found out in #23307, Java reduced classpath actions are another situation where files can disappear after spawn execution (causing problems for async uploading).

My current plan for 8.0.0 is to default to async uploading except for actions that move/delete their outputs. In the longer term I'd like to find a more definitive solution that lets us remove the sync code path entirely.

copybara-service bot pushed a commit that referenced this issue Aug 27, 2024
When an action may modify a spawn's outputs after execution, the upload of outputs to the cache and reuse for deduplicated actions need to happen synchronously directly after spawn execution to avoid a race.

This commit implements this for cache uploads by marking all actions with this property and simply disabling async upload for all spawns executed by such actions.

For output reuse, all executions deduplicated against the first one register atomically upon deduplication and cause the cache upload to wait for all of them to complete reuse.

Fixes #22501
Fixes #23288
Work towards #21578
Closes #23307 (no longer needed)

Closes #23382.

PiperOrigin-RevId: 668101364
Change-Id: Ice75dbe14a7dd46e02ecb096d2b2a30940216356
copybara-service bot pushed a commit that referenced this issue Sep 3, 2024
*** Reason for rollback ***

Made //src/test/shell/bazel/remote:remote_build_event_uploader_test flaky in CI, for reasons that are still unclear.

*** Original change description ***

Flip --remote_cache_async.

Moves uploads to a disk or remote cache to the background by default. Some actions (specifically, those that modify spawn outputs after execution) are opted out, and still upload in a blocking manner. See ActionExecutionMetadata#mayModifySpawnOutputsAfterExecution and overrides.

This is technically not an incompatible change because (in the absence of bugs) it doesn't affect user-visible behavior: asynchronous uploads cannot affect the behavior of hermetic actions.

Fixes #21578.

***

PiperOrigin-RevId: 670587546
Change-Id: I03eec8e55c16028854d5275694ae94ca13db796f
@tjgq
Copy link
Contributor

tjgq commented Sep 3, 2024

Reopening since we had to roll back the flag flip in b3cc928. I expect to roll forward again in time for Bazel 8.

@tjgq tjgq reopened this Sep 3, 2024
fmeum added a commit to fmeum/bazel that referenced this issue Sep 19, 2024
When an action may modify a spawn's outputs after execution, the upload of outputs to the cache and reuse for deduplicated actions need to happen synchronously directly after spawn execution to avoid a race.

This commit implements this for cache uploads by marking all actions with this property and simply disabling async upload for all spawns executed by such actions.

For output reuse, all executions deduplicated against the first one register atomically upon deduplication and cause the cache upload to wait for all of them to complete reuse.

Fixes bazelbuild#22501
Fixes bazelbuild#23288
Work towards bazelbuild#21578
Closes bazelbuild#23307 (no longer needed)

Closes bazelbuild#23382.

PiperOrigin-RevId: 668101364
Change-Id: Ice75dbe14a7dd46e02ecb096d2b2a30940216356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: feature request
Projects
None yet
Development

No branches or pull requests

6 participants