-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always snapshot files in COPY and RUN commands #289
Always snapshot files in COPY and RUN commands #289
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this! Left a couple comments :)
README.md
Outdated
add a layer in the case where a `RUN` command modifies a file but the contents do | ||
not change is non-deterministic. | ||
* With the time-only snapshot mode (`--snapshotMode=time`), kaniko may miss changes | ||
introduced by `RUN` commands entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few thoughts:
- Could we put "but the contents do not change" in bold? Just in case people only skim this section and think RUN is broken, when it only happens in the kinda strange case where nothing actually changes
- We could mention that since this only happens when files don't change the final filesystem will still be correct and the final image should still function correctly, it's just the composition of the layers that will differ from "docker build"
- After "kaniko may miss changes introduced by RUN completely" we should mention that this limitation is theoretical since it hasn't been reproduced (yet!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I gave it a shot @priyawadhwa ! PTAL and let me know if we should make any more tweaks. Also another option is we could make another doc and link to it somewhere that's not the main README, esp. since as you pointed out we don't know if this will ever actually happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, I think this makes a lot of sense! I think we could:
-
put the note "Note that these issues are currently theoretical only. If you see this issue occur, please open an issue." in the first bullet point since I think the second bullet point isn't theoretical.
-
remove the word "theoretically" from the second bullet point (since I think this was the bug you found? please ignore this if I'm wrong!)
COPY context/foo /foo | ||
COPY context/foo /foo | ||
COPY context/foo /foo | ||
COPY context/foo /foo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SOLID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
X'D
/test all |
I'm trying to setup prow on this repo, sorry for using your PR as a guinea pig :) |
/test all |
lol I totally saw this go by in my email and did not realize it was I who was the guinea pig |
Kaniko uses mtime (as well as file contents and other attributes) to determine if files have changed. COPY and ADD commands should _always_ update the mtime, because they actually overwrite the files. However it turns out that the mtime can lag, so kaniko would sometimes add a new layer when using COPY or ADD on a file, and sometimes would not. This leads to a non-deterministic number of layers. To fix this, we have updated the kaniko commands to be more authoritative in declaring when they have changed a file (e.g. WORKDIR will now only create the directory when it doesn't exist) and we will trust those files and _always_ add them, instead of only adding them if they haven't changed. It is possible for RUN commands to also change the filesystem, in which case kaniko has no choice but to look at the filesystem to determine what has changed. For this case we have added a call to `sync` however we still cannot guarantee that sometimes the mtime will not lag, causing the number of layers to be non-deterministic. However when I tried to cause this behaviour with the RUN command, I couldn't. This changes the snapshotting logic a bit; before this change, the last command of the last stage in a Dockerfile would always scan the whole file system and ignore the files returned by the kaniko command. Instead we will now trust those files and assume that the snapshotting performed by previous commands will be adequate. Docker itself seems to rely on the storage driver to determine when files have changed and so doesn't have to deal with these problems directly. An alternative implementation would use `inotify` to track which files have changed. However that would mean watching every file in the filesystem, and adding new watches as files are added. Not only is there a limit on the number of files that can be watched, but according to the man pages a) this can take a significant amount of time b) there is complication around when events arrive (e.g. by the time they arrive, the files may have changed) and lastly c) events can be lost, which would mean we'd run into this non-deterministic behaviour again anyway. Fixes GoogleContainerTools#251
Although we were able to reproduce this with the previous behaviour of the COPY and ADD commands, we have fixed that issue and our attempts to cause the issue to occur with RUN did not succeed, so it may be that in practice this will never happen.
This test had previously (before GoogleContainerTools#231) been making a change to a file in the kaniko dir, then checking that it isn't being snapshotted. This was to test the whitelisting logic, which makes sure that changes to /kaniko aren't included in images. However the test creates a temporary dir, so the kaniko dir is actually in /tmp/<some temp dir>/kaniko, and in GoogleContainerTools#231 the logic was simplified to no longer have a special case for tests. The test continued to pass because `MaybeAdd` noticed that the kaniko file wasn't changing, and didn't add it. After changing this to always add the files, it revealed that this was left behind by accident. I also opened GoogleContainerTools#307 to add integration test coverage for this logic. I also marked `CheckErrorAndDeepEqual` as a helper function so that when it fails, the line number reported is where that was called.
eca58b4
to
2fe93f2
Compare
pkg/snapshot/snapshot.go
Outdated
for _, file := range files { | ||
parentDirs := util.ParentDirectories(file) | ||
// If we do end up snapshotting the dirs, we need to snapshot them before | ||
// we snapshot their contents | ||
files = append(parentDirs, files...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@priyawadhwa can you confirm if the comment i added is correct? I was trying to understand why we were actually appending to parentDirs
instead of to files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I don't think the order matters! I'm pretty sure I did that bc when I was writing the function and debugging it the logging was easier to read as [/tmp, /tmp/foo]
instead of [/tmp/foo, /tmp]
(but I guess I removed the logging and kept the files the way they were) 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah kk! if we're going to have 2 separate lists it wont matter too much anyway
pkg/snapshot/snapshot.go
Outdated
|
||
var fileAdded bool | ||
lastParentFileIndex := len(files) - n | ||
isParentDir := i < lastParentFileIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one of the grossest things ive written in a long time (knock on wood) @priyawadhwa so let me know if it's not clear and/or you'd just like it to be nicer
i think the next step to making this better would be to break some of this logic out into a separate function or two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could have two separate lists, files
and parentDirs
, and then run Add on the files
and MaybeAdd on the parentDirs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kk let's do it!
Okay this is ready for another look @priyawadhwa ! Re. the failing test I had to:
|
/dog |
/bark |
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/meow space |
what! we've got bark but no meow! what kind of world is this |
muahhhaaaha |
@spiffxp I am relieved the situation is being taken seriously!! 😤 |
/bark |
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good! Just had one question.
Also this PR is super lit so many doggos !! 🐕
pkg/snapshot/snapshot.go
Outdated
for _, file := range files { | ||
parentDirs := util.ParentDirectories(file) | ||
// If we do end up snapshotting the dirs, we need to snapshot them before | ||
// we snapshot their contents | ||
files = append(parentDirs, files...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I don't think the order matters! I'm pretty sure I did that bc when I was writing the function and debugging it the logging was easier to read as [/tmp, /tmp/foo]
instead of [/tmp/foo, /tmp]
(but I guess I removed the logging and kept the files the way they were) 😅
pkg/snapshot/snapshot.go
Outdated
|
||
var fileAdded bool | ||
lastParentFileIndex := len(files) - n | ||
isParentDir := i < lastParentFileIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could have two separate lists, files
and parentDirs
, and then run Add on the files
and MaybeAdd on the parentDirs
?
@priyawadhwa I added 530c008 to separate handling of parent dirs from the files we're adding - but I started seeing some (TOTALLY UNRELATED) tests failing locally so I'm waiting to see if that turns up in CI or not before investigating further 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a couple nits, thanks for fixing this, it looks great!
integration/integration_test.go
Outdated
buildKaniko := exec.Command("docker", "build", "-t", ExecutorImage, "-f", "../deploy/Dockerfile", "..") | ||
err = buildKaniko.Run() | ||
cmd := exec.Command("docker", "build", "-t", ExecutorImage, "-f", "../deploy/Dockerfile", "..") | ||
_, err = RunCommandWithoutTest(cmd) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think go convention is to write error messages like this as
if _, err := RunCommandWithoutTest(cmd); err != nil {
}
filesAdded, err = s.snapShotFS(buf) | ||
} else { | ||
filesAdded, err = s.snapshotFiles(buf, files) | ||
filesAdded, err := s.snapshotFiles(buf, files) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't think you need to declare var filesAdded bool
on line 56 anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whops thanks!
pkg/snapshot/snapshot.go
Outdated
} | ||
// Only add to the tar if we add it to the layeredmap. | ||
addFile, err := s.l.MaybeAdd(file) | ||
err = t.AddFileToTar(file) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same error formatting as I commented above
if err := s.l.Add(file); err != nil {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good call
pkg/snapshot/snapshot.go
Outdated
continue | ||
} | ||
snapshottedFiles[file] = true | ||
info, err := os.Lstat(file) | ||
|
||
err = s.l.Add(file) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: same error formatting as I commented above
// AddToTar adds the file i to tar w at path p | ||
func AddToTar(p string, i os.FileInfo, hardlinks map[uint64]string, w *tar.Writer) error { | ||
// Tar knows how to write files to a tar file. | ||
type Tar struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!!
To make the logic a bit more clear, when snapshotting files, the parent dirs are now snapshotted in a different loop from the files we are actually trying to snapshot. Unfortunately this loop is nearly duplicated but I did managed to group some fo the related logic together: - A function to check if the file should be snapshotted (e.g. isn't whitelisted, etc.) - Created a `Tar` type to handle some of the logic around tar-ing, e.g. tracking hardlinks and stat-ing files before adding them One side effect of this is that now when snapshoting the file system, files will be stat-ed twice.
530c008
to
7f64037
Compare
Before GoogleContainerTools#289 was merged, when copying over directories for COPY kaniko would get a list of all files at the destination specified and add them to the list of files to be snapshotted. If the destination was root it would add all files. This worked because the snapshotter made sure the file had been changed before adding it to the layer. After GoogleContainerTools#289, we changed the logic to add all files snapshotted to a layer without checking if the files had been changed. This created the bug in got all the files at root and added them to the layer without checking if they had been changed. This change should fix this bug. Now, the CopyDir function returns a list of files it copied over and only those files are added to the list of files to be snapshotted. Should fix GoogleContainerTools#314
Kaniko uses mtime (as well as file contents and other attributes) to
determine if files have changed. COPY and ADD commands should always
update the mtime, because they actually overwrite the files. However it
turns out that the mtime can lag, so kaniko would sometimes add a new
layer when using COPY or ADD on a file, and sometimes would not. This
leads to a non-deterministic number of layers.
To fix this, we have updated the kaniko commands to be more
authoritative in declaring when they have changed a file (e.g. WORKDIR
will now only create the directory when it doesn't exist) and we will
trust those files and always add them, instead of only adding them if
they haven't changed.
It is possible for RUN commands to also change the filesystem, in which
case kaniko has no choice but to look at the filesystem to determine
what has changed. For this case we have added a call to
sync
howeverwe still cannot guarantee that sometimes the mtime will not lag, causing the
number of layers to be non-deterministic. However when I tried to cause
this behaviour with the RUN command, I couldn't.
This changes the snapshotting logic a bit; before this change, the last
command of the last stage in a Dockerfile would always scan the whole
file system and ignore the files returned by the kaniko command. Instead
we will now trust those files and assume that the snapshotting
performed by previous commands will be adequate.
Docker itself seems to rely on the storage driver to determine when
files have changed and so doesn't have to deal with these problems
directly.
An alternative implementation would use
inotify
to track which fileshave changed. However that would mean watching every file in the
filesystem, and adding new watches as files are added. Not only is there
a limit on the number of files that can be watched, but according to the
man pages a) this can take a significant amount of time b) there is
complication around when events arrive (e.g. by the time they arrive,
the files may have changed) and lastly c) events can be lost, which
would mean we'd run into this non-deterministic behaviour again anyway.
Fixes #251