Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows] Handle paths on Windows #3322

Closed

Conversation

gabriel-samfira
Copy link
Collaborator

This change properly handles paths paths on Windows, in the context of the ADD, COPY and WORKDIR stanzas. The Windows shell oddly handles a surprising array of path formats:

C:\Users\Administrator>start cmd
C:\Users\Administrator>cd /Windows
C:\Windows>cd \Windows\system32
C:\Windows\System32>cd C:/Users/Administrator
C:\Users\Administrator>cd /ProgramData\buildkitd
C:\ProgramData\buildkitd>

This change makes sure that regardless of the paths used in a Dockerfile, buildkit will be able to properly clean it, and use it to set the Cwd in the image spec and properly copy/add files inside the image.

This should bring us one step closer to fixing #616

util/system/path.go Outdated Show resolved Hide resolved
@gabriel-samfira gabriel-samfira force-pushed the handle-windows-paths branch 3 times, most recently from a916dba to 36b3177 Compare November 27, 2022 22:44
@gabriel-samfira gabriel-samfira changed the title Handle paths on Windows [Windows] Handle paths on Windows Nov 29, 2022
@tonistiigi tonistiigi added this to the v0.12 milestone Dec 6, 2022
import (
"path/filepath"
)

// CheckSystemDriveAndRemoveDriveLetter verifies that a path, if it includes a drive letter,
// is the system drive. This is a no-op on Linux.
func CheckSystemDriveAndRemoveDriveLetter(path string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this function, and it looks like code was forker from moby/moby, e.g. this one came from https://github.com/moby/moby/blob/487fa4a469b4fc77e1706f14acbd97c06383e754/pkg/system/path.go#L33

Should we look at unifying these packages to prevent maintaining two distinct implementations (and at risk of diverging?)

Copy link
Collaborator Author

@gabriel-samfira gabriel-samfira Dec 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me! Any suggestions in regards to which package this would best be placed in?

Edit: I have a feeling there are several functions that are probably copied over from docker. As we add Windows support to buildkit I think it would be useful to move those as well (if/as we find them).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd have to give it a look; my eye just dropped on this one, and the name of this function looked very familiar 😂
I also may be ruffling a hornets nest here 🙈

Some quick background (might look more in-depth later);

  • I know code has been forked / copied from docker to BuildKit, some of that related to there being a circular dependency between docker and buildkit (which can be "fun"), and packages in the docker repository not always being "easy" to consume (there's various packages that had dependencies on too many other things, which could cause dependency hell).
  • While understandable; it's also slightly risky; implementations may start to diverge, and (due to the circular dependency) in some cases leads to two implementations both being used in the same codebase. If both implementations have the same (similar) requirements, we should definitely try to avoid that.
  • I've been untangling some of that in the docker repository (still some work in progress), so that packages that have known consumers had no (or very little) dependencies.
  • In some cases, we decided to move those packages to be their own repository and module (e.g., https://github.com/moby/sys, which provides some modules, or https://github.com/moby/term)
  • A separate repo (and/or module) is definitely the easiest way to consume them, but not all packages may make sense to be a "project on their own"); so that would have to be looked at.
  • If it doesn't make sense as an independent module, we can still look what the best home is for it (which could be either inside the BuildKit repository, or in the Docker repository).
  • We're in the process of migrating docker to be a proper go module (which will solve some things, but may complicate some others, e.g. having a "go.mod" can force some things and be slightly less flexible).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a new module in the https://github.com/moby/sys called something like path, or util. It seems that both moby and buildkit have the system package copied over. There are probably a lot more of these small packages/functions re-implemented in various places.

Should we do this as part of this PR or should we open an issue and move it as part of a different change? What do you think would be the better way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, I think this can live in buildkitd. I will be happy to move this somewhere else when we decide where the best place for it would be. I am still new to both the buildkitd and moby code base, so I am not sure where that place would be 😄 .

@@ -321,9 +327,13 @@ type fileActionMkfile struct {
}

func (a *fileActionMkfile) toProtoAction(ctx context.Context, parent string, base pb.InputIndex) (pb.IsFileAction, error) {
normalizedPath, err := normalizePath(parent, a.file, false)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are repeating the normalization of the path over and over would it make sense to move the error handler as well to the function rather than copy and pasting the code?

util/system/path.go Outdated Show resolved Hide resolved
Copy link

@helsaawy helsaawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One way to skip needing all the path parsing and handeling would be to use GetFullPathNameW.
It does, however, expand relative paths bases on the process's current working directory, and I dont think you would want to have to make temp-dirs or change the working directory for all go-routines in the current process, so you may still need to do some processing to identify and rectify relative paths.

func GetFullPathName(p string) (string, error) {
	p0, err := syscall.UTF16PtrFromString(p)
	if err != nil {
		return "", err
	}

	n, err := windows.GetFullPathName(p0, 0, nil, nil)
	if n == 0 {
		return "", err
	}

	p1 := make([]uint16, n)
	_, err = windows.GetFullPathName(p0, n, &p1[0], nil)
	if err != nil {
		return "", err
	}

	return syscall.UTF16ToString(p1), nil
}

util/system/path_windows_test.go Outdated Show resolved Hide resolved
@gabriel-samfira
Copy link
Collaborator Author

One way to skip needing all the path parsing and handeling would be to use GetFullPathNameW.
It does, however, expand relative paths bases on the process's current working directory, and I dont think you would want to have to make temp-dirs or change the working directory for all go-routines in the current process, so you may still need to do some processing to identify and rectify relative paths.

Most of the path parsing in buildkit is a "translation layer" that converts what the container sees, to the corresponding physical path on the host. So we end up stripping away drive letters and joining what is left to what we know is the filesystem root as seen by the host. There's even some handling for symlinks and their targets. I am not sure if GetFullPathName is useful in this context.

@quangkieu
Copy link

Hey, I wonder is there still blocking on this pull request?

@gabriel-samfira
Copy link
Collaborator Author

gabriel-samfira commented Mar 7, 2023

Hey, I wonder is there still blocking on this pull request?

Not as far as I am concerned. Most of this PR is just replacing path with filepath and some new logic in a few function. Waiting for reviews at this point.

Copy link
Member

@jedevc jedevc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not taking a look at this a bit sooner, I've had it on my backlog for weeks 👀

Aside from the inline comments, I'm curious about general use of the path package - is this something we need to avoid using in the future when manipulating file paths? If so, I'm guilty of a few instances (especially around some of the attestations components). Is it worth attempting to deny-list this package (or some of it's functions) at the linter level, to prevent regression?

client/llb/fileop.go Outdated Show resolved Hide resolved
return errors.Wrap(err, "removing drive letter")
}

d.state = d.state.Dir(withoutDriveLetter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scans odd to me. I'm not sure why we have two separate ways for setting the path - the path in the state and the path in the config should be equivalent (this is also in the original).

I think we should restructure this so that we pass the result of NormalizeWorkdir directly into d.state.Dir (I think this is equivalent in behavior as long as NormalizeWorkdir always returns an absolute path?). Then we don't have two separate ways of computing the working directory.

If we're doing that, I think we can also push down the call to CheckSystemDriveAndRemoveDriveLetter into NormalizeWorkdir? Ideally, it would be nice to have as little windows-specific stuff in the dockerfile frontend code, since it then makes it harder for other frontends to also support windows.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. And would address @tonistiigi 's comment about basing this on the dispatchstate platform. We could decide whether or not to strip drive letters based on the target platform. Will change shortly.

@@ -1203,7 +1215,10 @@ func dispatchCopy(d *dispatchState, cfg copyConfig) error {
commitMessage.WriteString(" <<" + src.Path)

data := src.Data
f := src.Path
f, err := system.CheckSystemDriveAndRemoveDriveLetter(src.Path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of propagating this call throughout dispatchCopy, WDYT about pushing it down into llb.Copy (or fileActionCopy)? I'm not sure, I'm not as familiar with that code, but to me it seems like then we'd avoid having to copy this block everywhere, and other frontends would also automatically get the path fixes for Windows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will look into this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a couple of places where the source folder passed to llb.Copy() is wrapped in filepath.Join("/"):

if a == nil {
a = llb.Copy(cfg.source, filepath.Join("/", src), dest, opts...)
} else {
a = a.Copy(cfg.source, filepath.Join("/", src), dest, opts...)
}
}

If this path is a Windows path, this ends up becoming something like: \C:\test. If the requirement is to make sure the path is absolute before it gets passed to llb.Copy() here, the assumption being that it is possible this path may be relative in other places, we need to make sure the path is properly converted before we pass it in. Maybe a new function called ToAbolute(path string, platform string), that does the right thing based on target platform? What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping

solver/llbsolver/file/backend.go Outdated Show resolved Hide resolved
solver/llbsolver/file/backend.go Outdated Show resolved Hide resolved
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed I had some pending reviews I had forgotten to post 😬

@@ -66,17 +68,22 @@ func dirf(value string, replace bool, v ...interface{}) StateOption {
}
return func(s State) State {
return s.withValue(keyDir, func(ctx context.Context, c *Constraints) (interface{}, error) {
if !path.IsAbs(value) {
if !system.IsAbs(value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client code should never check for the os of the client itself. A windows client can build linux images and vice versa.

For the abs check, either both os should be checked or detected from the value. Or from the state platform.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! In this case, the s.GetPlatform(ctx) function should return the proper platform I suppose. Based on that I could decide how to handle the actual path.

wd := c.Path
if !path.IsAbs(c.Path) {
wd = path.Join("/", d.image.Config.WorkingDir, wd)
withoutDriveLetter, err := system.CheckSystemDriveAndRemoveDriveLetter(c.Path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic should be based on the dispatchstate platform.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to parametrize the target platform as much as I could. Let me know if the latest commit is what you had in mind.

@gabriel-samfira
Copy link
Collaborator Author

Noticed I had some pending reviews I had forgotten to post grimacing

No worries! Had that happen to me a couple of times 😄. Will address the comments ASAP.

@gabriel-samfira
Copy link
Collaborator Author

Disclaimer: Really sorry for the wall of text 😄.

Aside from the inline comments, I'm curious about general use of the path package - is this something we need to avoid using in the future when manipulating file paths?

It depends.

There are places, such as, for example (but not limited to) some public APIs of fsutil, where we need to have unix style paths.

In some cases, unix style paths are acceptable even on Windows, as the nice folks over at MSFT have worked hard to improve compatibility with other platforms. You can see an example of the various ways to type in a path in the PR description.

There are however places where unix style paths are not allowed on Windows. One such example is the cwd field passed into hcsshim which in turn calls into Host Compute System to create the actual container. We need to explicitly filepath.ToSlash() that value before we pass it in.

That being said, the path module deals with unix style paths. We can still use it, but we have to be really careful where we use it. The IsAbs() function and friends, will not work for paths like C:\test, but will work for paths that have the drive letter stripped and slashes converted (/test).

We strip the drive letter in a bunch of places, because we want to join that path with the root path of the container FS as seen by the host. Coincidentally, the path /test is also a valid absolute path in a Windows shell. It is worth mentioning however, that it is relative to the "active" drive letter.

For example, if I am on C:, the path refers to C:\test, but if I change to another drive letter in my shell:

D:

Then the path /test will refer to D:\test. There are also some other oddities in the way relative paths work. You can have bare relative paths (without a drive letter), or drive letter prefixed relative paths. Each drive letter maintaining their own CWD. So if your CWD on D: is D:\test and your CWD on C: is C:\Users, using: C:gabriel will actually point to C:\Users\gabriel, while if I use D:gabriel it will point to D:\test\gabriel.

Luckily, we only need to care about C:. Anything else is not supported inside a container. Stripping the drive letter gives us semantics that are very similar to unix style paths, in terms of absolute and relative paths.

But we still need to care about how we handle them, and we need to be able to properly identify an absolute path (as seen by the container, which usually includes a drive letter and which is supplied by the user in a Dockerfile/llb). This won't be doable with standard packages alone. The reason for that is that on Windows, like we've seen, paths are a whole maze of different implementations. As such, in the Go standard library, filepath correctly identifies a number of different paths as absolute. However, not all of those are relevant in the context of a container. We won't have UNC paths inside a container (I think). Also, I am not sure we need to care about special devices like CON, AUX, etc. This is why I added the IsAbs() function you see in this PR.

If so, I'm guilty of a few instances (especially around some of the attestations components).

That is quite alright. Once we identify a sane way to handle paths, any incongruity can be changed, proper comments and tests can be added to prevent regressions. But as detailed above, getting it right is a bit of a can of worms 😄.

Is it worth attempting to deny-list this package (or some of it's functions) at the linter level, to prevent regression?

I can't say for sure at this stage. We'll have to see where this PR ends up and decide from there.

@gabriel-samfira gabriel-samfira force-pushed the handle-windows-paths branch 4 times, most recently from 2e0581f to 7abc3ed Compare April 7, 2023 15:41
@gabriel-samfira gabriel-samfira force-pushed the handle-windows-paths branch 2 times, most recently from a05bcb2 to 4d2c75e Compare April 7, 2023 16:02
@gabriel-samfira
Copy link
Collaborator Author

This is interesting. The following failure indicates an exit code a bit foreign to any Linux I am aware of: https://github.com/moby/buildkit/actions/runs/4664892558/jobs/8257778803?pr=3322#step:7:1404

This looks like it's maybe running under WSL?

@riverar
Copy link

riverar commented Apr 13, 2023

Also, I am not sure we need to care about special devices like CON, AUX, etc.

Not to impede progress but NUL is a popular device used to work around bad tooling, quiet tool output, etc. Just keep that in mind!

@gabriel-samfira
Copy link
Collaborator Author

Not to impede progress but NUL is a popular device used to work around bad tooling, quiet tool output, etc. Just keep that in mind!

Indeed! The code paths here, mostly deal with transforming paths as seen by the container into paths to the mounted container root FS on the host. It also deals with the WORKDIR that is used when using the RUN stanza and which will eventually be set in the image spec.

When buildkit needs to copy files from one layer to another or from some other source, it mounts the layer(s) in a temp folder. It inspects the COPY stanza (for example) from a Dockerfile, takes the source/destination file path and resolves that to the actual path on the host. It then proceeds to copy. In the Dockerfile you may have:

COPY --from=someStage C:/myFile.txt C:/some/path/myFile.txt

Which will translate to something like:

cp C:\Windows\temp\buildkit-temp1\myFile.txt C:\Windows\temp\buildkit-temp2\some\path\myFile.txt

Where the source layer is mounted in C:\Windows\temp\buildkit-temp1 and the destination layer is mounted in C:\Windows\temp\buildkit-temp2.

The NUL special device is useful during a RUN operation, which invokes the executor and spins up a temporary container using the layer we are mutating as a scratch layer. In such cases, the devices exist and should work as expected.

Do you see any reason to COPY or ADD to a special device inside a Dockerfile?

@riverar
Copy link

riverar commented Apr 13, 2023

Makes sense, thanks for the background on that.

Do you see any reason to COPY or ADD to a special device inside a Dockerfile?

Not at this time 👍

@gabriel-samfira gabriel-samfira force-pushed the handle-windows-paths branch 3 times, most recently from b52088e to c93bead Compare April 26, 2023 11:53
@gabriel-samfira
Copy link
Collaborator Author

Would it help if I attempt to split this PR into smaller PRs?

@gabriel-samfira
Copy link
Collaborator Author

Hi @jedevc @tonistiigi

Please take another look when you have some cycles.

@gabriel-samfira gabriel-samfira force-pushed the handle-windows-paths branch 2 times, most recently from 0c07313 to 79dc7d4 Compare May 11, 2023 18:29
@crazy-max
Copy link
Member

Needs rebase.

Is it still a WIP? Some commits look a bit odd to me to clearly understand the scope:

  • For c16d9ab and 149a2e3, I think we should have as first commit the IsAbs and NormalizeWorkdir impl with tests.
    • And another commit for handling of workdir and paths on Windows. Could also merge 79dc7d4 with this commit as well.
  • 145355e does not look directly related to PR desc so might be good for another PR.
    • Or you could merge 145355e and 5b7ab16 together, that looks fine too.
  • 64da2ff. Could we bump fsutil in another PR first?

@gabriel-samfira
Copy link
Collaborator Author

Hi @crazy-max

Needs rebase.

Will do!

Is it still a WIP?

Not a WiP anymore. It is, however awaiting another look after the last set of changes.

Some commits look a bit odd to me to clearly understand the scope:

This PR grew as the conversation evolved. Commits should be squashed, as subsequent commits changed code from previous commits. One example is CheckSystemDriveAndRemoveDriveLetter() moved from platform specific files into a common path.go. It now takes the platform as a parameter and does the right thing. See: #3322 (comment) . We also need this because Windows can also handle LCOW (Linux Containers On Windows), so we should be target platform aware.

I can attempt to split this in several PRs. It will remove some of the confusion and narrow the scope a bit. I will propose a new set of PRs to replace this one in the next couple of days.

gabriel-samfira and others added 7 commits May 26, 2023 01:30
This change properly handles paths on Windows, in the context of the
WORKDIR, COPY and ADD stanzas.

Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
Co-authored-by: Justin Chadwell <[email protected]>
Signed-off-by: Gabriel Adrian Samfira <[email protected]>
@gabriel-samfira
Copy link
Collaborator Author

@crazy-max I opened:

Converted this to draft to avoid confusion, but leaving it as it contains a longer discussion related to changes that were split. Looking forward to any input you may have.

@gabriel-samfira
Copy link
Collaborator Author

Closing this, as the work has been merged in other branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants