Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jib Cache Selector - FileAlreadyExistsException #3572

Closed
tlefevre opened this issue Feb 3, 2022 · 17 comments
Closed

Jib Cache Selector - FileAlreadyExistsException #3572

tlefevre opened this issue Feb 3, 2022 · 17 comments

Comments

@tlefevre
Copy link

tlefevre commented Feb 3, 2022

Environment:

  • Jib version: 3.2.0
  • Build tool: Gradle 7.3.3
  • OS: Linux (Jenkins VM)

Description of the issue:

Builds will randomly fail with a FileAlreadyExistsException. It's unclear what unsticks it, it's quite flaky.

Expected behavior:

The build to complete succesfully.

Steps to reproduce:

  1. Unable to provide, it's flaky. Sometimes it'll work, sometimes it won't.

Log output:

Caused by: com.google.cloud.tools.jib.plugins.common.BuildStepsExecutionException: /var/lib/jenkins/workspace/GUL_mock-dataservice_master/build/jib-cache/selectors/86a92a1db96dde1836553acbe913013699c4976077907f3f71d15bc5d034820f
at com.google.cloud.tools.jib.plugins.common.JibBuildRunner.runBuild(JibBuildRunner.java:285)
at com.google.cloud.tools.jib.gradle.BuildDockerTask.buildDocker(BuildDockerTask.java:125)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at org.gradle.internal.reflect.JavaMethod.invoke(JavaMethod.java:104)
... 114 more
Caused by: java.nio.file.FileAlreadyExistsException: /var/lib/jenkins/workspace/GUL_mock-dataservice_master/build/jib-cache/selectors/86a92a1db96dde1836553acbe913013699c4976077907f3f71d15bc5d034820f
at com.google.cloud.tools.jib.cache.CacheStorageWriter.writeSelector(CacheStorageWriter.java:473)
at com.google.cloud.tools.jib.cache.CacheStorageWriter.writeUncompressed(CacheStorageWriter.java:295)
at com.google.cloud.tools.jib.cache.Cache.writeUncompressedLayer(Cache.java:144)

@elefeint
Copy link
Contributor

elefeint commented Feb 4, 2022

@tlefevre Thank you for the report!

Do you have gradle parallel builds turned on? It sounds like a race condition in the same broad area as #3347.

@tlefevre
Copy link
Author

tlefevre commented Feb 4, 2022

Hi @elefeint! Yes, our Gradle runner is enabled with parallel builds. However, this particular build is not building jib images in parallel.

Given that, I was unsure if it was related to #3347. Do you think it would be a good idea to disable parallel builds for the jib task?

@elefeint
Copy link
Contributor

elefeint commented Feb 4, 2022

That would be a good experiment, if your environment allows it.
Another good (independent) experiment would be to turn on jib.useOnlyProjectCache to reduce interference between modules.

@tlefevre
Copy link
Author

tlefevre commented Feb 4, 2022

The above corruption is in a project that:

  • Has only a single Gradle module
  • Only builds a single image via Jib

I'll try disabling parallel first.

According to this link: #1956 (comment) it looks like we're using our local cache, since the cache directory above is stated to be in the build directory? It seems given this, that useOnlyProjectCache would not do anything?

@elefeint
Copy link
Contributor

elefeint commented Feb 4, 2022

That's correct; @chanseokoh was just explaining to me why jib.useOnlyProjectCache won't work.

@tlefevre
Copy link
Author

tlefevre commented Feb 7, 2022

Tried used "--no-parallel", the build still failed.

java.nio.file.FileAlreadyExistsException: /var/lib/jenkins/workspace/_api_feature_GUL-5452_gspsupport/server/build/jib-cache/selectors/2cadfb6605804ff1e49f11ea4b53bca7e7f40548af5a1c9c2c5f5cc744f29cde

@tlefevre
Copy link
Author

tlefevre commented Feb 7, 2022

Tried adding the clean task to the Gradle command. Build still fails. I can see that the clean task does delete the build directory and that the jib-cache is exclusive for the submodule.

@elefeint
Copy link
Contributor

elefeint commented Feb 7, 2022

@tlefevre Try also setting jib.applicationCache to a non-default directory, such as /tmp. The best theory we have right now is that some parts of Jenkins filesystem do not suport atomic file moves.

@chanseokoh
Copy link
Member

What we see from the log:

at com.google.cloud.tools.jib.cache.CacheStorageWriter.writeSelector(CacheStorageWriter.java:473)

The line 473 where the exception is thrown is

} catch (AtomicMoveNotSupportedException ignored) {
Files.move(temporarySelectorFile, selectorFile, StandardCopyOption.REPLACE_EXISTING);
}

It's after Jib got AtomicMoveNotSupportedException, so it's almost certain that the directory /var/lib/jenkins/workspace/_api_feature_GUL-5452_gspsupport/server/build/jib-cache/selectors/2cadfb6605804ff1e49f11ea4b53bca7e7f40548af5a1c9c2c5f5cc744f29cde is on the filesystem that doesn't support atomic move (e.g., network filesystem).

@tlefevre
Copy link
Author

I finally got a hold of someone that could explain a bit more how our Jenkins VMs are put together and there are no available physical discs on them. It's all network filesystems, we have no option of adding physical drives.

Do you have any recommendations?

I don't remember this always being an issue. It appears exclusively, we think, in building Grails applications. We also have a lot of Spring Boot applications that do not have the same error. The difference with Grails is that we have to depend on a few compile tasks, then add extra directories. This could potentially introduce more points into the build where we can get a failure.

@elefeint
Copy link
Contributor

Jib is ultimately limited by what the filesystem can do as far as atomic operations.
For a consistent flake like this, your best bet is to automatically detect a flake and rerun the test. Two flakes in a row is much less likely than a a single flake.

@tlefevre
Copy link
Author

We found the issue. An extra directory was duplicated, which caused this issue. Took us a bit to find it.

Thanks for all your help and sorry for the trouble :)

@elefeint
Copy link
Contributor

That's excellent! How did you end up narrowing down the issue?

@tlefevre
Copy link
Author

We figured that every time a layer got added it'd introduce a risk, which would most likely be connected to the extraDirectories functionality so we figured we'd take a closer look at that.

@chanseokoh
Copy link
Member

FTR, as a last resort, once can completely disable concurrency by setting the system/Maven property (e.g., -Djib.serialize=true or setting that in pom.xml), but this can of course be very slow.

@tal-ayalon
Copy link

@tlefevre Hi, we are having the same issue.
We have the same issue with FileAlreadyExists. In some builds it happen and in others it won't. Really Flakky.
We are using Jenkins on EC2, we don't have a network file system (such as NFS).
Can you help how you managed to solve this problem?
The build is failing every once in a while, on a different target every time

@tlefevre
Copy link
Author

I said so in the comment just before I closed the issue.

Check your extra directories or see if Jib is trying to access the same file repeatedly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants