Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] bazel client failure: Directory.getFilesList() is null #1299

Closed
luxe opened this issue Mar 28, 2023 · 4 comments · Fixed by #1379
Closed

[bug] bazel client failure: Directory.getFilesList() is null #1299

luxe opened this issue Mar 28, 2023 · 4 comments · Fixed by #1379
Labels

Comments

@luxe
Copy link
Contributor

luxe commented Mar 28, 2023

On latest buildfarm, the bazel client prints Cannot invoke "build.bazel.remote.execution.v2.Directory.getFilesList()" because the return value of "java.util.Map.get(Object)" is null. Here is the full bazel client stacktrace:

java.io.IOException: com.google.devtools.build.lib.remote.ExecutionStatusException: UNKNOWN: Cannot invoke "build.bazel.remote.execution.v2.Directory.getFilesList()" because the return value of "java.util.Map.get(Object)" is null
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:235)
	at com.google.devtools.build.lib.remote.RemoteExecutionService.executeRemotely(RemoteExecutionService.java:1258)
	at com.google.devtools.build.lib.remote.RemoteSpawnRunner.lambda$exec$2(RemoteSpawnRunner.java:268)
	at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:244)
	at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:125)
	at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:114)
	at com.google.devtools.build.lib.remote.RemoteSpawnRunner.exec(RemoteSpawnRunner.java:243)
	at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:245)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:146)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:108)
	at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
	at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:68)
	at com.google.devtools.build.lib.exec.StandaloneTestStrategy.beginTestAttempt(StandaloneTestStrategy.java:440)
	at com.google.devtools.build.lib.exec.StandaloneTestStrategy.access$200(StandaloneTestStrategy.java:84)
	at com.google.devtools.build.lib.exec.StandaloneTestStrategy$StandaloneTestRunnerSpawn.beginExecution(StandaloneTestStrategy.java:672)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction.beginIfNotCancelled(TestRunnerAction.java:921)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction.beginExecution(TestRunnerAction.java:888)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction.execute(TestRunnerAction.java:946)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction.execute(TestRunnerAction.java:937)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:907)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1076)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1031)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:152)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:91)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:492)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:856)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:349)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:590)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: com.google.devtools.build.lib.remote.ExecutionStatusException: UNKNOWN: Cannot invoke "build.bazel.remote.execution.v2.Directory.getFilesList()" because the return value of "java.util.Map.get(Object)" is null
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.handleStatus(GrpcRemoteExecutor.java:71)
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.getOperationResponse(GrpcRemoteExecutor.java:83)
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$2(GrpcRemoteExecutor.java:194)
	at com.google.devtools.build.lib.remote.Retrier.execute(Retrier.java:244)
	at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:125)
	at com.google.devtools.build.lib.remote.RemoteRetrier.execute(RemoteRetrier.java:114)
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.lambda$executeRemotely$3(GrpcRemoteExecutor.java:140)
	at com.google.devtools.build.lib.remote.util.Utils.refreshIfUnauthenticated(Utils.java:525)
	at com.google.devtools.build.lib.remote.GrpcRemoteExecutor.executeRemotely(GrpcRemoteExecutor.java:138)
	... 32 more

This message seems to be comming from buildfarm. bazel would report it here:
https://github.com/bazelbuild/bazel/blob/a691e974d2e4c5fa4a469e1321b18d15ac7e9cfa/src/main/java/com/google/devtools/build/lib/remote/GrpcRemoteExecutor.java#L71

There are a few places in buildfarm where we call Directory.getFilesList(), but I'm trying to see why the directory would be null in the first place, and how this would be forwarded back to the client.

@luxe luxe added the bug label Mar 28, 2023
@werkt
Copy link
Member

werkt commented Mar 29, 2023

You're not seeing anything in serverside logs from this?

@werkt
Copy link
Member

werkt commented Mar 29, 2023

I would expect the log line to be prefixed with "error occurred during execution", which is logged for any exception save for one thrown during watchExecution or execute on ShardInstance, both of which don't have any codepaths that I can see to getFilesList().

@werkt
Copy link
Member

werkt commented May 10, 2023

@luxe have you seen this since?

@werkt
Copy link
Member

werkt commented Jun 18, 2023

This happens when we have to revisit a directory with descendent directories that are either empty (zero size on digest) or were missing from the directories index in the previous pass (some fully replicated dir in the input tree). Both of these need to be guarded against - normal default directory instance for the empty dir, failed precondition for the extra path to a missing directory.

werkt added a commit that referenced this issue Jun 18, 2023
Directories reevaluated only under the enumeration hierarchy must still
be guarded against empty child directories in their checks, and must
handle child directories missing in the index safely, with precondition
failures matching their outputs. Order is not guaranteed in precondition
output, but tests now guard this case.

Fixes #1299
werkt added a commit that referenced this issue Jun 18, 2023
Directories reevaluated only under the enumeration hierarchy must still
be guarded against empty child directories in their checks, and must
handle child directories missing in the index safely, with precondition
failures matching their outputs. Order is not guaranteed in precondition
output, but tests now guard this case.

Fixes #1299
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants