Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException in DockerClientResolver when running multiple Gradle BuildImageTasks in parallel #3981

Closed
erdi opened this issue Apr 6, 2023 · 9 comments · Fixed by #3993

Comments

@erdi
Copy link
Contributor

erdi commented Apr 6, 2023

Environment:

  • Jib version: 3.3.1
  • Build tool: Gradle 8.0.1
  • OS: Linux

Description of the issue:

We have multiple projects in our Gradle build which have the jib plugin applied to them and we are using parallel execution. This means that multiple instances com.google.cloud.tools.jib.gradle.BuildImageTask can execute in parallel. When that happens we are sometimes seeing the following exception (stacktrace truncated to significant parts only):

  Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.ClassLoader.getParent()" because "this.currentLoader" is null
    at com.google.cloud.tools.jib.docker.DockerClientResolver.resolve(DockerClientResolver.java:38)
    at com.google.cloud.tools.jib.api.JibContainerBuilder.<init>(JibContainerBuilder.java:104)
    at com.google.cloud.tools.jib.api.Jib.from(Jib.java:87)
    at com.google.cloud.tools.jib.api.JavaContainerBuilder.from(JavaContainerBuilder.java:182)
    at com.google.cloud.tools.jib.plugins.common.PluginConfigurationProcessor.getJavaContainerBuilderWithBaseImage(PluginConfigurationProcessor.java:543)
    at com.google.cloud.tools.jib.plugins.common.PluginConfigurationProcessor.processCommonConfiguration(PluginConfigurationProcessor.java:425)
    at com.google.cloud.tools.jib.plugins.common.PluginConfigurationProcessor.processCommonConfiguration(PluginConfigurationProcessor.java:490)
    at com.google.cloud.tools.jib.plugins.common.PluginConfigurationProcessor.createJibBuildRunnerForRegistryImage(PluginConfigurationProcessor.java:309)
    at com.google.cloud.tools.jib.gradle.BuildImageTask.buildImage(BuildImageTask.java:116)

Expected behavior:
No NPEs being thrown when multiple instances com.google.cloud.tools.jib.gradle.BuildImageTask execute in parallel.

Steps to reproduce:
Run multiple instances of com.google.cloud.tools.jib.gradle.BuildImageTask in parallel. This is a concurrency issue so it's not always reproducible.

Additional Information:
I think that com.google.cloud.tools.jib.docker.DockerClientResolver#resolve is not thread safe yet it can be called from multiple threads by the Gradle plugin when running with parallel execution enabled. It uses a reference to a static instance of ServiceLoader which according to its javadoc is not thread safe:

Instances of this class are not safe for use by multiple concurrent threads.

@erdi
Copy link
Contributor Author

erdi commented Apr 9, 2023

Would a PR simply changing com.google.cloud.tools.jib.docker.DockerClientResolver to instantiate a new instance of ServiceLoader per resolve() call be considered as a fix for this, @suztomo? I'd be willing to contribute that.

@erdi
Copy link
Contributor Author

erdi commented Apr 27, 2023

Another stacktrace with a slightly different error caused by what I believe is the same issue:

    Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 2 out of bounds for length 2
       at com.google.cloud.tools.jib.docker.DockerClientResolver.resolve(DockerClientResolver.java:38)
       at com.google.cloud.tools.jib.api.JibContainerBuilder.<init>(JibContainerBuilder.java:104)
       at com.google.cloud.tools.jib.api.Jib.from(Jib.java:87)
       at com.google.cloud.tools.jib.api.JavaContainerBuilder.from(JavaContainerBuilder.java:182)
       at com.google.cloud.tools.jib.plugins.common.PluginConfigurationProcessor.getJavaContainerBuilderWithBaseImage(PluginConfigurationProcessor.java:543)
       at com.google.cloud.tools.jib.plugins.common.PluginConfigurationProcessor.processCommonConfiguration(PluginConfigurationProcessor.java:425)
       at com.google.cloud.tools.jib.plugins.common.PluginConfigurationProcessor.processCommonConfiguration(PluginConfigurationProcessor.java:490)
       at com.google.cloud.tools.jib.plugins.common.PluginConfigurationProcessor.createJibBuildRunnerForRegistryImage(PluginConfigurationProcessor.java:309)
       at com.google.cloud.tools.jib.gradle.BuildImageTask.buildImage(BuildImageTask.java:116)

@emmileaf
Copy link
Contributor

emmileaf commented May 5, 2023

Hey @erdi, thanks for the report and investigation! I think you’re right, looks like the iterator usage of ServiceLoader runs into a similar multithreading limitation as reported per https://bugs.openjdk.org/browse/JDK-8230843.

Curiosity: how often were you observing this issue in parallel builds, and were you able to verify in some way that moving the ServiceLoader.load() call into the static method resolved this issue? (I'm guessing the rationale is that all instances of DockerClientResolver share the static field dockerClients in the current implementation, causing the race condition, but moving it to a local variable in the static resolve() method would have each thread calling this method store this variable in its own stack instead?)

@erdi
Copy link
Contributor Author

erdi commented May 9, 2023

Thanks for getting back to me, @emmileaf.

The most frequent it failed for us recently was 3 times in 9 builds, but on average you can expect one failure in every 100 builds. It's actually hard to give you a true frequency as it requires scouring through failed build logs to confirm that the failure was due to this problem, but it happens frequently enough for us to notice it's a thing and thus report it.

I did not test this fix because I would need to fork, build and publish a custom version of the plugin and then apply it to our build. But I believe that It's based on logic and strong evidence:

  • odd stacktraces point at concurrent usage of a ServiceLoader instance
  • it's clear that the code in question can be ran concurrently in parallel builds
  • ServiceLoader is clearly not thread safe as stated in its javadoc
  • use an instance of ServiceLoader per a call to DockerClientResolver.dockerClients() rather than sharing an instance between all of them to avoid using a ServiceLoader instance concurrently which is what you are referring to by

I'm guessing the rationale is that all instances of DockerClientResolver share the static field dockerClients in the current implementation, causing the race condition, but moving it to a local variable in the static resolve() method would have each thread calling this method store this variable in its own stack instead?

@emmileaf
Copy link
Contributor

Thanks for the additional info! It's helpful to know roughly how frequently the issue is encountered, and the proposed patch/rationale makes sense to me. I’ll go ahead and merge the PR.

A thought for the future if the error still persists: since the exceptions seem to be thrown from iteration over dockerClients, it could be worth adding a try/catch around the iteration and have it return Optional.empty(), or retry in some way upstream. DockerClientResolver was introduced to enable usage of custom DockerClient implementations (which I'm guessing you are not using in your builds, but please correct me otherwise), and Jib should fall back to its default implementation of CliDockerClient if resolution is unsuccessful instead of failing the build.

emmileaf pushed a commit that referenced this issue May 10, 2023
* Fix for #3981.
* Use an instance of ServiceLoader per a call to DockerClientResolver.dockerClients() rather than sharing an instance between all of them, to avoid using a ServiceLoader instance concurrently
@erdi
Copy link
Contributor Author

erdi commented May 12, 2023

Thanks for approving and merging, @emmileaf.

Do you know if a release is planned any time soon? It would be great to be able to benefit from the two contributions I've made recently, #3993 and #3936.

A thought for the future if the error still persists

I'm fairly sure that this particular issue will be solved by the fix. A ServiceLoader instance was clearly used concurrently and no longer will be so there is no longer an obvious reason for the iteration over it to fail.

@erdi
Copy link
Contributor Author

erdi commented May 12, 2023

Oh I see a release might be coming! 🎉

https://github.com/GoogleContainerTools/jib/releases/tag/v0.24.0-core

@emmileaf
Copy link
Contributor

jib-core 0.24.0 (and jib-gradle-plugin 3.3.2 for your use case) has been released with this fix!

@erdi
Copy link
Contributor Author

erdi commented May 12, 2023

That's great news, thanks again, @emmileaf!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants