Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fake, and speed up reproducible builds #12

Merged
merged 7 commits into from
Jun 10, 2024

Conversation

mafredri
Copy link
Member

@mafredri mafredri commented May 28, 2024

See this in use here: coder/envbuilder#213

This PR adds support for fake builds, essentially a way to detect if a build is cached and what the final image hash should be.

The final image hash requires reproducible builds.

Anothe option to using reproducible builds is to instead tag the final image with the final build step hash.

Part of coder/envbuilder#186

Example output from cache hit (DoFakeBuild):

#2: fakeStage 0 built successfully with digest sha256:d76c332b89231eb2b8324e28f53ea4f5a394231ee76efdc2a8c8ad11ffd0891b
#2: 🏗️ Built fake image! [1.245278609s]

DoBuild:

#2: Stage 0 built successfully with digest sha256:d76c332b89231eb2b8324e28f53ea4f5a394231ee76efdc2a8c8ad11ffd0891b
#3: 🏗️ Built image! [2.020869675s]
#4: 🏗️ Pushing image...
#2: Pushing image to 172.17.0.3:5000/local/cache
#2: Pushed 172.17.0.3:5000/local/cache@sha256:d76c332b89231eb2b8324e28f53ea4f5a394231ee76efdc2a8c8ad11ffd0891b
#4: 🏗️ Pushed image! [6.134857ms]

Example output from cache miss:

#2: failed to build fake image: error fake building stage: uncached command *commands.RunMarkerCommand is not supported in fake build
#2: 🏗️ Built fake image! [1.232616498s]

@mafredri mafredri force-pushed the mafredri/feat-fake-and-faster-reproducible-builds branch from 68a0b67 to 696ae8d Compare May 30, 2024 17:32
@@ -444,6 +449,92 @@ func (s *stageBuilder) build() error {
return nil
}

// fakeBuild is like build(), but does not actually execute the commands or
// extract files.
func (s *stageBuilder) fakeBuild() error {
Copy link

@dannykopping dannykopping May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS this aligns with the implementation of build(). I'm concerned that this will drift; how do you plan to keep these two in lock-step?

As an aside: I think the term "fake" is a problematic one.

This PR adds support for fake builds, essentially a way to detect if a build is cached and what the final image hash should be.

Perhaps renaming it to something (admittedly less pithy) like "cacheProbeBuild" or "preemptiveBuild" might be more clear?

The term "fake" is quite overloaded and I don't think it expresses the intent well here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair concern, one I have as well. I don't really have a plan for ensuring they're kept in sync other than adding tests to verify a build and "fakeBuild" produce the same (or not) hash in the end.

I also agree with you on "fake", and "cacheProbe" is actually a pretty good one, thanks for the suggestions.

Copy link
Member

@johnstcn johnstcn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far, but I think this needs some more unit tests before merging.

Comment on lines +117 to +121
// PAX and GNU Format support additional timestamps in the header
if hdr.Format == tar.FormatPAX || hdr.Format == tar.FormatGNU {
hdr.AccessTime = ct
hdr.ChangeTime = ct
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If canonical is set and we encounter other formats (such as formatSTAR or formatMax), should we throw an error?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I don't know how we'd encounter that, actually 🤔. This was borrowed from:

//PAX and GNU Format support additional timestamps in the header
if header.Format == tar.FormatPAX || header.Format == tar.FormatGNU {
header.AccessTime = t
header.ChangeTime = t
}

Comment on lines 57 to 64
func NewCanonicalTar(f io.Writer) Tar {
w := tar.NewWriter(f)
return Tar{
w: w,
hardlinks: map[uint64]string{},
canonical: true,
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may require extra tests in tar_util_test.go

@@ -444,6 +449,92 @@ func (s *stageBuilder) build() error {
return nil
}

// fakeBuild is like build(), but does not actually execute the commands or
// extract files.
func (s *stageBuilder) fakeBuild() error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about making a separate fakeBuilder that embeds *stageBuilder but overrides the build() method?

e.g. https://go.dev/play/p/5ZxGr4ozeOV

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you also thinking we'd merge the logic in Build and FakeBuild into one, just swap out the stageBuilder? (My main reason to keep them separate was to remove anything that could accidentally cause changes to the fs.)

I'll think about it, definitely doable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking, just a suggestion

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what could solve both my concern (about lock-step with build()) and Cian's is an adapter that can be passed in which would effect the changes. For your "fake" case, this adapter would not interact with the FS even though it satisfies the implementation.

I haven't looked too closely at the implementation so maybe this is im{practical,possible}, but thought I'd mention it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking, just a suggestion

Ditto 👍

@johnstcn
Copy link
Member

johnstcn commented Jun 10, 2024

I've renamed the DoFakeBuild etc. methods to ProbeCache, and added a minimal integration-style test for same.
CanonicalTar is now ReproducibleTar as that did not appear immediately descriptive to me.

Copy link

@BrunoQuaresma BrunoQuaresma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good and tested. 👍

Copy link

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, the new names are very clear as well 👍

}
mFile := filepath.Join(testDir, "proc/mountinfo")
mountInfo := fmt.Sprintf(
`36 35 98:0 /kaniko %s/kaniko rw,noatime master:1 - ext3 /dev/root rw,errors=continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slick

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's just someone else's slick code I copy-pasted 🙃

@johnstcn johnstcn merged commit 0a73fcd into main Jun 10, 2024
9 checks passed
Copy link
Member

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

post-merge approval 👍

I'm curious if this modification works with multi-stage Dockerfiles.

@johnstcn
Copy link
Member

post-merge approval 👍

I'm curious if this modification works with multi-stage Dockerfiles.

Good call. I added a separate test and it doesn't appear to work. Will raise a separate PR.

@mtojek
Copy link
Member

mtojek commented Jun 11, 2024

I don't mind adding support for multi-stage Dockerfiles as a separate issue 👍

@mafredri mafredri deleted the mafredri/feat-fake-and-faster-reproducible-builds branch September 25, 2024 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants